Econ 712 Lecture 2

Viewer
Transcript

Economics 704 Steven N. Durlauf Fall 2015

Lecture Notes 5. Mathematics of Time Series

1. Basic ideas The Hilbert space framework provides a very powerful language for discussing the relationship between various random variables. Collections of random variables are called stochastic processes; in common usage stochastic processes are usually understood to refer to collections of random variables whose elements are indexed by time.

We focus on the case of a scalar

stochastic process  xt  where t is an integer as it is convenient to think of time stretching from  to  . second-order stationary.

We assume that the process is zero-mean and Second-order stationarity, also known as weak

stationarity, means that the autocovariances between xt  j and xt do not depend on t. Formally, assume i) E  xt   0 and ii) E  xt xt  j     j    . For random variables such as the elements of the stochastic process xt , the natural metric, i.e. notion of length, for a random variable its standard deviation,

 

xt = E xt2

(5.1)

with covariance as the associated notion of inner product,  xt , xt  j   E  xt xt  j 

1

(5.2)

One can generate a Hilbert space around the sequence. xt , xt 1, xt 2

.

What this means is that one forms a space by taking these elements, adding all linear combinations of the elements, all limits of the linear combinations, etc1. We denote this Hilbert space as Ht  x  . The entire history of the stochastic process from  to  generates H  x  . By construction, Ht 1  x   Ht  x  . The general properties of Hilbert spaces described in Lecture Notes 3 allow one to characterize the linear structure of Ht  x  in ways that are very important in macroeconomics.

First, observe that by the Hilbert space

decomposition theorem, one can decompose Ht  x  so that

Ht  x   Ht 1  x   Gt

(5.3)

where Gt is another Hilbert space. The dimension of this Hilbert space is either 0 or 1. This is so because the Hilbert space Gt must be spanned by the single









random variable xt  proj xt Ht 1  x  where proj xt Ht 1  x  is the projection of

xt onto Ht 1  x  . To say the space Gt has dimension 0 means that xt  Ht 1  x  ,





i.e. var xt  proj xt Ht 1  x 

  0 .

If one again applies the Hilbert space

decomposition theorem, it is the case that

Ht  x   Ht 2  x   Gt 1  Gt

Here Gt 1 is spanned by



(5.4)



xt 1  proj xt 1 Ht 2  x  . One can repeat this

decomposition any number of times. The Gt spaces are by construction mutually orthogonal. This may be repeated an arbitrary number of times.

1Technically,

we are working with the smallest Hilbert space that contains the elements of the stochastic process. 2

Notice that it is not the necessarily the case that, if this decomposition is done an infinite number of times, the Gt ’s may be used to reconstruct Ht  x  . The reason for this is each space is constituted by elements that appear in the space Ht  j  x  but not in the space Ht  j 1  x  ; if there are elements that appear in every member of the sequence Ht  x  , Ht 1  x  …, they will not appear in any of the Gt ’s. Elements that are common to all of the Ht  x  ’s form a Hilbert space as well. Formally, this space is defined as

H   x  

 t 

Ht  x 

(5.5)

The Hilbert space generated by current and past xt ’s can therefore be decomposed as

Ht  x   Gt  Gt 1  ...  H   x 

(5.6)

2. Wold decomposition theorems

Equation (5.6) is the basis for two fundamental theorems in time series analysis, each due to Herman Wold; his 1948 article is still worth reading. Rozanov (1967) is a deep treatment. I find Ash and Gardner’s (1967) discussion to be especially insightful.

Theorem 5.1. Wold decomposition theorem I

Any zero-mean, finite variance, second-order stationary xt may be decomposed as

3

xt  x1t  x2t

(5.7)

x1t  Gt  Gt 1  Gt 2  ...

(5.8)

x2t  H  x 

(5.9)

where

and

In this decomposition, x1t is called the indeterministic component and x2t the deterministic component of xt .

The terms refer to whether or not the

components may be perfectly (linearly) predicted from the past. When a time series contains a nontrivial indeterministic component, the time series itself is said to be indeterministic.

If the process does not contain a deterministic

component, it is purely indeterministic. The term x2t may be perfectly predicted from information in the arbitrarily distant past. One example of deterministic component is a seasonal. Consider the stochastic process cos t    where  is uniformly distributed on Since



 

cos t   d  0 and

  cos   t  j     cos t    d 



  ,  .

does not

depend on t ,2 cos t    is a candidate for x2t . The second Wold Theorem characterizes the linear structure of the indeterministic part of a time series.

2This

follows from the identity cos   t  j     cos t    

1 cos  j   cos  2t  j  2   . 2

4

Theorem 5.2. Wold decomposition theorem II

If xt is a purely indeterministic, zero-mean, finite variance, second-order stationary process, then there exists a representation of the process of the form 

xt    j  t  j ,  0  1

(5.10)

j 0

where  t  Gt and  2t   2t  j j .



 

j t j

j 0

is known as the fundamental moving

average (MA) representation of xt and is unique. Pf. Since Ht  x   Ht 1  x   Gt , by construction Gt is a Hilbert space of maximum dimension 1; the space is spanned by  t which is a scalar random variable. If the dimension is zero (which would mean that var   t   0 ), then Ht  x   Ht 1  x  which contradicts the assumption that the process is purely indeterministic. Since the process is indeterministic, one may find an element  t in Gt such that the projection of xt onto Gt is  t ; this is nothing more than a choice of axis for For the spaces Gt  j ( j  0 ), one can find an

the one dimensional space.

element in each of them, denoted as  t  j , whose variance equals that of  t ; each

t j

spans

its

respective

space.

xt

Since



Ht  x   Gt  Gt 1  ... Letting proj xt Gt  j

is

purely

indeterministic,

 denote the projection of

xt onto Gt  j ,

by the Hilbert space projection theorem 







xt   proj xt Gt  j    j  t  j j 0

5

j 0

(5.11)

where the second equality follows from the definition of the  t ’s. This verifies the theorem, except for uniqueness.

To prove uniqueness, suppose that there 

existed another MA representation xt    j  t  j . For this to be the case, the j 0

variance of





j 0

j 0

  j  t  j    j  t  j must equal zero since by assumption the parts of

the expression are the same. 

The variance of this expression equals

 2   j   j  , which equals zero iff  j   j  0  j . 2

j 0

An immediate implication of the second Wold theorem is that the 

fundamental moving average coefficients are square summable, i.e.   2j   . j 0

This follows from the fact that





j 0

j 0

var  xt   var   j  t  j   2   2j

and the

assumption that  x  0    . What is meant by the term fundamental in the description of the moving average representation? It is a way of designating a particular orthogonal basis for Ht  x  . There exist an uncountable infinity of different orthogonal bases for

Ht  x  , just as there are for spaces such as R k . As far as I know the term is taken from Rozanov (1967); it was popularized by Christopher Sims. For a purely indeterministic process, there is an equivalence between the Hilbert space generated around the stochastic process xt and the Hilbert space generated around the fundamental innovations  t

Corollary 5.1. Equivalence between the Hilbert space of a purely indeterministic time series and its associated fundamental innovations.

6

Assume xt is 0 mean, weakly stationary, and purely indeterministic. Let Ht    denote the Hilbert space generated by  t ,  t 1... , the fundamental moving average errors. Then Ht     Ht  x  .

Pf. This is left as an exercise; the proof amounts to showing that each of the two Hilbert spaces is a subset of the other.

3. Prediction

We now consider the question of how to optimally predict a time series given its history.

Let xt t  j denote the projection of xt onto Ht  j ( x ) . This

projection is important in that it is also the solution to the linear prediction problem for xt relative to the information set Ht  j  x  .

Theorem 5.3. Optimal linear predictor. The projection xt k t is the solution to min Ht ( x ) E  xt k   

Pf. Let 

solve the minimization problem.



2

The prediction error equals



xt k t  xt k  xt k t   . The variance of this term will equal  x2t k  xt k t   x2  , t k t since xt k  xt k t

 x2

t  k  xt  k t

is orthogonal to xt k t   .

This variance must exceed

unless xt k t   is zero. Uniqueness of the projection then verifies the

result. This theorem implies that  t is the forecast error associated with the optimal (in a minimum variance sense) linear forecast of xt given the information

7

set Ht 1  x  . The term linear means that the forecast has to be an element of the space, and so is either a linear combination of xt 1 , xt 2 , … or the limit of such a sequence.

Hence, one can think of a time series as a weighted average of

current and past forecast errors. This is intuitive since these forecast errors reveal aspects of the process that are realized each time period. The optimal linear predictor theorem provides insight into the nature of information sets and associated predictions. By construction, xt t  j  Ht  j  x  and

xt  xt t  j  Gt  Gt 1  ...  Gt  j 1 .

The

fundamental

innovations

 t ,..., t  j 1

represent the part of xt that is revealed after the forecast. The fundamental innovations can be thought of as information increments between time periods. From the perspective of the Wold theorems, the moving average representation of a time series can thus be seen as a natural way of thinking about its underlying linear structure; much of time series analysis is based on this perspective, which is often referred to as the time domain approach. That said, nothing in these derivations rules out the presence of nonlinear structure in the stochastic process. What is the structure of a k  step ahead forecast? Since 

xt k    j  t k  j , the equivalence of Ht  x  and Ht   makes it immediate that j 0

xt k t may be written as



xt k t    j k  t  j

(5.12)

j 0

The reason for this is simple: at time t, one can condition forecasts on  t ,  t 1 ,… i.e. Ht    . k 1

  j 0

j t k  j

The component of xt k that is determined by  t k ,..., t 1 , i.e.

is orthogonal to Ht    and so its linear projection onto this space is 0.

8

Put differently, at time t one has no basis for constructing predictions of the future shocks. To achieve a more parsimonious expression for (5.12), I introduce the annihilation operator.

Definition 5.1 Annihilation operator Let   L  be a polynomial lag operator of the form

  2L2   1L1   0   1L   2L2 

(5.13)

The annihilation operator, denoted as   eliminates all lag terms with negative exponents, i.e.

  L 



 0   1L   2L2 

(5.14)

Using the annihilation operator, one can re-express the optimal predictor as

  L   xt k t   k   t  L 

(5.15)

When the MA polynomial is invertible,   L  xt   t , so that one can explicitly 1

express the predictor as a weighted average of current and lagged xt ’s; (5.15) implies

  L   1 xt k t   k    L  xt  L 

9

(5.16)

This formula expresses forecasts as weighted averages of observables. Equations (5.15) and (5.16) are known as the Wiener-Kolmogorov prediction formulas, named after independent discoverers Norbert Wiener and Andrei Kolmogorov.

Example 5.1. MA(1) process. Let xt   t   t 1 . Then the AR Wiener-Kolmogorov formula for the optimal linear prediction of the process is 1  1 L  xt k t   k  1   L  xt  L 

(5.17)

(2.18) which equals 0 for k  1, as one would expect since the process is unaffected by shocks more than one period in the past.

Example 5.2. AR(1) process. If xt   xt 1   t , then the AR Wiener-Kolmogorov formula for the optimal linear projection is

   xt k t     i k Li  1   L  xt   k xt  i 0 

(5.19)

Forecasts of AR(1) process over a fixed horizon thus possess a very simple structure: xt k 1t   xt k t .

10

References Ash, R. and M. Gardner, (1975), Topics in Stochastic Processes. New York: Academic Press. Rozanov, Y., (1967), Stationary Time Series, San Francisco: Holden Day. Wold, H., (1948), “On Prediction in Stationary Time Series,” Annals of Mathematical Statistics, 19, 4, 558-567.

11

where t. G is another Hilbert space. The dimension of this Hilbert space is either 0 or 1. This is so because the Hilbert space t. G must be spanned by the single.

Download PDF

249KB Sizes 0 Downloads 240 Views

Report

Econ 712 Lecture 2

00 (3Q @OGO 712;

S16 712.pdf

Economics 712

PRINCIPLES OF ECONOMICS 2 ECON 2302 ...

Week 2 Lecture Material.pdf

Lecture 2 of 4.pdf

2-TLC-Lecture note.pdf

phys570-lecture-2.pdf

EE 396: Lecture 2

PRINCIPLES OF ECONOMICS 2 ECON 2302 ...

ECON 1210-2 Michalopoulos Course Evaluations.pdf

lecture 2: intro to statistics - GitHub

Old Dominion University Lecture 2 - GitHub

Lecture 2: Measuring Firm Heterogeneity

CSE342/542 -â Lecture 2

RM NÂº 712-2017-MINEDU.pdf

ECON 41 syllabus

Economics (ECON).pdf

Economics (ECON).pdf