Discrete-Time Dynamic Programming Alexis Akira Toda∗ October 31, 2017 Abstract This note explains the classic Samuelson (1969) optimal consumptionportfolio problem. Other useful references might be Hakansson (1970) and Toda (2014).

1

Optimal portfolio problem

1.1

Model

Time is denoted by t = 0, 1, . . . , T (maybe T = ∞). There are J assets indexed by j ∈ J = {1, . . . , J}. (In Samuelson (1969), J = 2, a risky and a riskless asset.) Starting from some initial wealth w0 > 0, the investor wants to maximize the lifetime expected utility T X c1−γ E0 βt t , 1−γ t=0 where β > 0 is the discount factor, γ > 0 is the relative risk aversion coefficient, and ct is consumption. We say that such an investor has an additive CRRA preference (CRRA stands for constant relative risk aversion). Let Ptj be the per share price of asset j, and Dtj be the dividend. The gross return of asset j between time t and t + 1 is j Rt+1 =

j j Pt+1 + Dt+1

Ptj

,

that is, we adopt the convention that dividends are paid at the beginning of each period and prices are quoted at the end of each period after dividends are paid 1 J out. Let Rt+1 = (Rt+1 , . . . , Rt+1 ) be the vector of asset returns. For simplicity, T assume that {Rt }t=0 is i.i.d. over time. However, the joint distribution of asset returns is arbitrary in the cross-section. Let θtj be the fraction of wealth invested in asset j (θtj > 0 (< 0) corresponds to a long (short) position in asset j), and PJ θt = (θt1 , . . . , θtJ ) be the portfolio. By accounting, we have j=1 θtj = 1. The timing is as follows. At the beginning of period t, the asset returns Rt realizes and determines that period’s initial wealth, wt . Given this wealth, the investor chooses consumption ct and portfolio θt . Let Rt+1 (θ) = R0t+1 θ =

J X j=1

[email protected]

1

j Rt+1 θj

be the gross return on the portfolio between time t and t + 1. Then the budget constraint is wt+1 = Rt+1 (θt )(wt − ct ) ≥ 0.

1.2

Solution: finite horizon

Since we assumed that the returns are i.i.d., the only state variable is wealth.1 With a slight abuse of notation, let VT (w) be the value function when T periods are left in the future (the investor has T +1 periods to live, including the current period). Then for T ≥ 1 the Bellman equation is   1−γ c + β E[VT −1 (w0 )] w0 = R(θ)(w − c) . VT (w) = max c,θ 1−γ If T = 0, since the investor has no choice but to consume his wealth, we have V0 (w) =

w1−γ . 1−γ

By homotheticity, we can show: 1−γ

Lemma 1. For each T , there exists aT such that VT (w) = aT w1−γ . Proof. Let c0 , . . . , cT be the optimal consumption starting from wealth w, with value function VT (w). By the linearity of the budget constraint, if the initial wealth is λw (where λ > 0), then the consumption λc0 , . . . , λcT is feasible. By the homotheticity of the utility function, the associated lifetime utility will be λ1−γ VT (w). Since the optimal value starting from wealth λw is VT (λw), it follows that λ1−γ VT (w) ≤ VT (λw). (1) To show the reverse inequality, let w0 = λw and λ0 = 1/λ in (1). Then we get (1/λ0 )1−γ VT (λ0 w0 ) ≤ VT (w0 ) ⇐⇒ (λ0 )1−γ VT (w0 ) ≥ VT (λ0 w0 ).

(2)

Dropping the primes (0 ) in (2) and using (1), we get λ1−γ VT (w) ≥ VT (λw).

(3)

In particular, letting λ = 1/w, we get VT (w) = VT (1)w1−γ ≡ aT

w1−γ . 1−γ

Using the Lemma and substituting the budget constraint into the Bellman equation, we obtain  1−γ   c w1−γ (R(θ)(w − c))1−γ aT = max + β E aT −1 c,θ 1−γ 1−γ 1−γ  1−γ  c 1 1−γ 1−γ = max + βaT −1 (w − c) max E[R(θ) ] . (4) c θ 1−γ 1−γ From (4) we obtain the second result: 1 See https://sites.google.com/site/aatoda111/file-cabinet/172B_L08.pdf short note on dynamic programming.

2

for

a

Proposition 2. The optimal portfolio is θ∗ ∈ arg maxθ

1 1−γ

E[R(θ)1−γ ].

For later computations, it is useful to define 1

1

ρ = E[R(θ∗ )1−γ ] 1−γ = max E[R(θ)1−γ ] 1−γ . θ

1−γ

(The second equality uses the fact that x 7→ x1−γ is monotone.) Substituting the definition of ρ into (4), we get  1−γ  1−γ w1−γ c 1−γ ρ aT = max + βaT −1 (w − c) . (5) c 1−γ 1−γ 1−γ Now the right-hand side is just a maximization in one variable, c. Since the objective function is concave in c, the first-order condition is necessary and sufficient. Therefore c−γ − βaT −1 ρ1−γ (w − c)−γ = 0 ⇐⇒ c = (βaT −1 ρ ⇐⇒ c =

1−γ − γ1

)

(6)

(w − c)

w 1

1 + (βaT −1 ρ1−γ ) γ

.

(7)

Substituting (7) into (5) and canceling 1 − γ, we get aT w1−γ = c1−γ + βaT −1 ρ1−γ (w − c)1−γ = c1−γ + c−γ (w − c)  γ 1 = wc−γ = 1 + (βaT −1 ρ1−γ ) γ w1−γ 1/γ

⇐⇒ aT

(∵ (6)) (∵ (7))

1/γ

= 1 + (βρ1−γ )1/γ aT −1 .

1/γ

Letting bT = aT , we obtain a first-order linear difference equation bT = 1 + (βρ1−γ )1/γ bT −1 . 1/γ

The initial condition is b0 = a0 bT =

T X

= 1. Therefore the solution is

(βρ1−γ )k/γ =

k=0

1 − (βρ1−γ )

T +1 γ 1

1 − (βρ1−γ ) γ

.

Using (7), the optimal consumption rule is 1

w 1 − (βρ1−γ ) γ c= = T +1 w. bT 1 − (βρ1−γ ) γ Note that 1 = b1 < b2 < · · · < bT < · · · , so the longer the time horizon, the smaller fraction of wealth (1/bT ) you should consume. However, the portfolio is the same over time (at least with i.i.d. assumptions).

3

1.3

Solution: infinite horizon

The solution for the case with infinite horizon is basically the same. You might guess that all you need to do is to let T → ∞ in the finite horizon, so (assuming βρ1−γ < 1) the coefficient of the value function is b = 1/(1 − (βρ1−γ )1/γ ) and the consumption rate is c/w = 1 − (βρ1−γ )1/γ . This guess is correct, but there are technical subtleties. To address the technical issues, let us consider the following more general problem: max E0

∞ X

{ct }∞ t=0

ft (ct , xt )

t=0

subject to ct ∈ Γt (xt ), xt+1 = gt+1 (ct , xt ). Here xt is the state variable, ct is the control variable, ft (ct , xt ) is the flow utility, Γt is the constraint set, and gt+1 is the law of motion for the state variable. A similar (general) problem is discussed in Stokey and Lucas (1989), but since they put strong assumptions on ft (such as ft (c, x) = β t u(c, x) with u bounded), their results are practically inapplicable.2 Clearly the optimal consumption-portfolio problem is a special case by reinterpreting the variables and functions. I attack this problem as follows. Let VtT (x) =

Et

sup −1 {ct+s }T s=0

T −1 X

ft+s (ct+s , xt+s )

s=0

be the T period value function starting at t and state variable x = xt . Let Vt∞ (x) = lim supT →∞ VtT (x) be the infinite horizon value function and Vt∗ (x) =

sup {ct+s }∞ s=0

Et

∞ X

ft+s (ct+s , xt+s )

s=0

be the true value function. Lemma 3. Vt∗ (x) ≤ Vt∞ (x) always. ∞

Proof. Take any feasible consumption plan {ct+s }s=0 starting from x. Then by the definition of the value function, for any T we have Et

T −1 X

ft+s (ct+s , xt+s ) ≤ VtT (x).

s=0

By the definition of the infinite horizon utility and infinite horizon value function, letting T → ∞, we get Et

∞ X

ft+s (ct+s , xt+s ) = lim Et

T −1 X

ft+s (ct+s , xt+s ) s=0 lim sup VtT (x) = Vt∞ (x). T →∞ T →∞

s=0





Taking the supremum of the left-hand side over {ct+s }s=0 , we get Vt∗ (x) ≤ Vt∞ (x). 2 Dynamic programming in infinite horizon is still an active area of research. See Kamihigashi (2014) for recent developments.

4



I say that the plan {ct+s }s=0 is recursively optimal if it solves  ∞ ∞ Vt+s (xt+s ) = max ft+s (c, xt+s ) + Et+s Vt+s+1 (gt+s+1 (c, xt+s )) c∈Γ(xt+s )

for s = 0, 1, . . . . To use the results of the finite horizon dynamic programming, we want to show Vt∗ (x) = Vt∞ (x). The following proposition provides a necessary and sufficient condition. Proposition 4. Vt∗ (x) = Vt∞ (x) if and only if the transversality condition lim sup Et [VT∞ (xT )] ≤ 0 T →∞

holds, where xT is the state variable obtained from a recursively optimal policy. ∞

Proof. Take a recursively optimal policy {ct+s }s=0 . By definition, we have T −1 X

Vt∞ (x) = Et

ft+s (ct+s , xt+s ) + Et [VT∞ (xT )].

(8)

s=0

Letting T → ∞ we obtain Vt∗ (x) ≥ lim inf Et T →∞

= lim inf =

T −1 X

ft+s (ct+s , xt+s )

s=0 [Vt∞ (x)

T →∞ Vt∞ (x)

− Et [VT∞ (xT )]]

− lim sup Et [VT∞ (xT )] T →∞

≥ Vt∗ (x) − lim sup Et [VT∞ (xT )],

(∵ Lemma)

T →∞

so lim supT →∞ Et [VT∞ (xT )] ≥ 0 always. If lim supT →∞ Et [VT∞ (xT )] ≤ 0, then actually lim supT →∞ Et [VT∞ (xT )] = 0, so all the above inequalities become equalities. Therefore Vt∗ (x) = Vt∞ (x). Conversely, if Vt∗ (x) = Vt∞ (x), then the recursively optimal policy is also optimal. Therefore letting T → ∞ in (8), we obtain Vt∗ (x) = lim Et T →∞

=

Vt∗ (x)

T −1 X

ft+s (ct+s , xt+s ) + lim Et [VT∗ (xT )] T →∞

s=0

+ lim

T →∞

Et [VT∗ (xT )],

so lim sup Et [VT∞ (xT )] = lim Et [VT∗ (xT )] = 0 ≤ 0. T →∞

T →∞

Now we apply this proposition to solve the infinite horizon optimal consumptionportfolio problem. Assuming βρ1−γ < 1, the infinite horizon value function is V ∞ (w) = a where a1/γ = b =

w1−γ , 1−γ

1 > 0. 1 − (βρ1−γ )1/γ 5

The transversality condition lim supT →∞ E0 [β T VT∞ (wT )] ≤ 0 (here there is β T because it is a discounted problem) is trivial if γ > 1 because then VT∞ (wT ) ≤ 0. If 0 < γ < 1, then by the budget constraint we have wt+1 = Rt+1 (θ∗ )(wt − ct ) ≤ Rt+1 (θ∗ )wt , so wT ≤ w0

QT

t=1

Rt (θ∗ ). Taking the (1 − γ)-th power and expectations, we get

E0 [wT1−γ ] ≤ w01−γ E[R(θ∗ )1−γ ]T = w01−γ ρ(1−γ)T . Hence E0 [β T VT∞ (wT )] ≤

aw01−γ (βρ1−γ )T → 0 1−γ

as T → ∞, because βρ1−γ < 1.

2

Income fluctuation problem

With multiplicative risk (e.g., random asset returns), it is convenient to work with CRRA utilities for tractability. With additive risk (e.g., random labor income), CARA utilities are more convenient.

2.1

Model

Consider an agent with additive CARA utility E0

∞ X

β t u(ct ),

(9)

t=0

where u(c) = −e−γc /γ with absolute risk aversion γ > 0.3 The agent can borrow or save at a gross risk-free rate R > 1. The agent is subject to income risk. The income process is given by yt+1 = ρyt + εt+1 ,

(10)

where 0 ≤ ρ < 1 and the error term εt+1 is i.i.d. over time.4 Letting wt be the financial wealth at the beginning of time t (excluding current income), the budget constraint is wt+1 = R(wt − ct + yt ). The Bellman equation is V (w, y) = max {u(c) + β E[V (R(w − c + y), y 0 )] | y 0 = ρy + ε} . c

(11)

Since the CARA utility is defined on the entire real line, we assume that consumption can be negative. 3I

focus on CARA preferences because it is tractable with additive shocks (Calvet, 2001; Wang, 2003, 2007; Angeletos and Calvet, 2005, 2006). 4 Without loss of generality, we may assume that the AR(1) process (10) does not contain a constant term. This is because I have put no structure on the distribution of ε, so if there is a constant term we can always shift the distribution of ε so that the constant term is 0.

6

2.2

Solution

The following proposition gives a closed-form solution of the income fluctuation problem. Proposition 5 (Wang, 2003). The value function and optimal consumption rule are given by 1 −γ(aw+b+dy) e , γa c(w, y) = aw + b + dy,

V (w, y) = −

(12a) (12b)

where a = 1 − 1/R, R−1 1 b= log βR E[e−γ R−ρ ε ], γ(1 − R) R − 1) . d= R−ρ Proof. Again we prove by guess-and-verify. Substituting (12a) into the Bellman equation, we obtain   1 −γ(aw+b+dy) 1 −γc β h −γ(aR(w−c+y)+b+dy0 ) i − e = max − e − E e . (13) c γa γ γa The first-order condition with respect to c is h i 0 e−γc − βR E e−γ(aR(w−c+y)+b+dy ) = 0.

(14)

Substituting (14) into (13), we obtain 1 1 −γ(aw+b+dy) e =− − γa γa

  1 a+ e−γc . R

(15)

Comparing the coefficients, (15) trivially holds if a = 1−1/R and c = aw+b+dy. In this case, aR(w − c + y) = aw + (1 − R)b + (1 − R)(d − 1)y, so (14) becomes h i 0 e−γ(aw+b+dy) = βR E e−γ(aw+(1−R)b+(1−R)(d−1)y+b+dy ) h i ⇐⇒ e−γdy = βR E e−γ((1−R)b+(1−R)(d−1)y+d(ρy+ε) . (16) Since (8) is an identity, comparing the coefficients of y, we obtain d = (1 − R)(d − 1) + ρd ⇐⇒ d =

R−1 . R−ρ

Substituting into (16), we obtain h i R−1 1 = βR E e−γ((1−R)b+ R−ρ ε) ⇐⇒ b =

R−1 1 log βR E[e−γ R−ρ ε ]. γ(1 − R)

7

R−1

Remark. Note that E[e−γ R−ρ ε ] is the moment generating function Mε (s) = R−1 . E[esε ] of ε evaluated at s = −γ R−ρ Remark. We can embed this income fluctuation problem into a general equilibrium model, which is a version of the Huggett (1993) model. Toda (2017) considers such a model with a VAR(1) income dynamics and shows that multiple equilibria are possible (although the equilibrium is unique in the AR(1) case). With multiple equilibria, comparative statics may go in different directions depending on the choice of the equilibrium. Toda (2017) provides an example in which increasing income risk is welfare improving! Remark. For a proof of the transversality condition, see the appendix of Toda (2017).

References George-Marios Angeletos and Laurent-Emmanuel Calvet. Incomplete-market dynamics in a neoclassical production economy. Journal of Mathematical Economics, 41(4-5):407–438, August 2005. doi:10.1016/j.jmateco.2004.09.005. George-Marios Angeletos and Laurent-Emmanuel Calvet. Idiosyncratic production risk, growth and the business cycle. Journal of Monetary Economics, 53: 1095–1115, 2006. doi:10.1016/j.jmoneco.2005.05.016. Laurent-Emmanuel Calvet. Incomplete markets and volatility. Journal of Economic Theory, 98(2):295–338, June 2001. doi:10.1006/jeth.2000.2720. Nils H. Hakansson. Optimal investment and consumption strategies under risk for a class of utility functions. Econometrica, 38(5):587–607, September 1970. doi:10.2307/1912196. Mark Huggett. The risk-free rate in heterogeneous-agent incomplete-insurance economies. Journal of Economic Dynamics and Control, 17(5-6):953–969, September-November 1993. doi:10.1016/0165-1889(93)90024-M. Takashi Kamihigashi. Elementary results on solutions to the Bellman equation of dynamic programming: Existence, uniqueness, and convergence. Economic Theory, 56(2):251–273, 2014. doi:10.1007/s00199-013-0789-4. Paul A. Samuelson. Lifetime portfolio selection by dynamic stochastic programming. Review of Economics and Statistics, 51(3):239–246, August 1969. doi:10.2307/1926559. Nancy L. Stokey and Robert E. Lucas, Jr. Recursive Methods in Economic Dynamics. Harvard University Press, 1989. Alexis Akira Toda. Incomplete market dynamics and cross-sectional distributions. Journal of Economic Theory, 154:310–348, November 2014. doi:10.1016/j.jet.2014.09.015. Alexis Akira Toda. Huggett economies with multiple stationary equilibria. Journal of Economic Dynamics and Control, 84:77–90, November 2017. doi:10.1016/j.jedc.2017.09.002.

8

Neng Wang. Caballero meets Bewley: The permanent-income hypothesis in general equilibrium. American Economic Review, 93(3):927–936, June 2003. doi:10.1257/000282803322157179. Neng Wang. An equilibrium model of wealth distribution. nal of Monetary Economics, 54(7):1882–1904, October doi:10.1016/j.jmoneco.2006.11.005.

9

Jour2007.

Discrete-Time Dynamic Programming

Oct 31, 2017 - 1 − γ. E[R(θ)1−γ]. } . (4). From (4) we obtain the second result: 1See https://sites.google.com/site/aatoda111/file-cabinet/172B_L08.pdf for a short note on dynamic programming. 2 ..... doi:10.1016/j.jet.2014.09.015. Alexis Akira Toda. Huggett economies with multiple stationary equilibria. Journal of Economic ...

315KB Sizes 1 Downloads 202 Views

Recommend Documents

Dynamic programming
Our bodies are extraordinary machines: flexible in function, adaptive to new environments, .... Moreover, the natural greedy approach, to always perform the cheapest matrix ..... Then two players take turns picking a card from the sequence, but.

Dynamic Programming
Dynamic programming (DP) is a mathematical programming (optimization) .... That is, if you save 1 dollar this year, it will grow to 1 + r dollars next year. Let kt be ...

Uniform value in dynamic programming
the supremum distance, is a precompact metric space, then the uniform value v ex- .... but then his payoff only is the minimum of his next n average rewards (as if ...

Uniform value in Dynamic Programming
We define, for every m and n, the value vm,n as the supremum payoff the decision maker can achieve when his payoff is defined as the average reward.

Discrete Stochastic Dynamic Programming (Wiley ...
Deep Learning (Adaptive Computation and Machine Learning Series) ... Pattern Recognition and Machine Learning (Information Science and Statistics).

Uniform value in dynamic programming - CiteSeerX
that for each m ≥ 0, one can find n(m) ≥ 1 satisfying vm,n(m)(z) ≤ v−(z) − ε. .... Using the previous construction, we find that for z and z in Z, and all m ≥ 0 and n ...

Uniform value in dynamic programming - CiteSeerX
Uniform value, dynamic programming, Markov decision processes, limit value, Black- ..... of plays giving high payoffs for any (large enough) length of the game.

UNIT III GREEDY AND DYNAMIC PROGRAMMING ...
UNIT III. GREEDY AND DYNAMIC PROGRAMMING. Session – 22 ... The possible ways to connect S & D d(S,D) = min { 1+d(A,D); 2+d(F,D);5+d(C,D)}. (1) d(A,D) ...

Dynamic programming for robot control in real-time ... - CiteSeerX
performance reasons such as shown in the figure 1. This approach follows .... (application domain). ... is a rate (an object is recognized with a rate a 65 per cent.

PDF Dynamic Programming and Optimal Control, Vol. I ...
I, 4th Edition, read online Dynamic Programming and Optimal Control, Vol. .... been instrumental in the recent spectacular success of computer Go programs.

optimal binary search tree using dynamic programming pdf ...
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

UNIT III GREEDY AND DYNAMIC PROGRAMMING ...
subproblems. Best choice does depend on choices so far. Many subproblems are repeated in solving larger problems. This repetition results in great savings.

Approximate Dynamic Programming applied to UAV ...
Abstract One encounters the curse of dimensionality in the application of dy- namic programming to determine optimal policies for large scale controlled Markov chains. In this chapter, we consider a base perimeter patrol stochastic control prob- lem.

optimal binary search tree using dynamic programming pdf ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. optimal binary ...

A Polynomial-Time Dynamic Programming Algorithm ... - ACL Anthology
Then it must be the case that c(Hj) ≥ c(Hj). Oth- erwise, we could simply replace Hj by Hj in H∗, thereby deriving a new 1-n path with a lower cost, implying that H∗ is not optimal. This observation underlies the dynamic program- ming approach.

Dynamic programming for robot control in real-time ... - CiteSeerX
is a conception, a design and a development to adapte the robot to ... market jobs. It is crucial for all company to update and ... the software, and it is true for all robots in the community .... goals. This observation allows us to know if the sys