Phase transitions for controlled Markov chains on infinite graphs Naoyuki Ichihara∗

Abstract This paper is concerned with some long-run-average cost problems for controlled Markov chains with a denumerable state space. The criterion to be optimized contains both reward and penalty functions. As a trade-off between reward and penalty, we observe certain phase transition phenomena. Our results also provide a stochastic optimal control interpretation for phase transitions of discrete homopolymers with finite attracting potentials.

1

Introduction

Long-run-average cost problems for controlled Markov chains, or Markov decision processes, have been investigated for over half a century. A great deal of results can be found in both theoretical and applied probability literature (e.g. [1, 13, 16, 17] and the references therein). One of the most fundamental and effective methods of solving those problems is the stochastic dynamic programming due to Bellman (see [2]). It allows to determine the optimal value, together with its optimal control if it exists, in terms of the so-called optimality equation. For long-run-average cost problems, optimality equations take the form of a nonlinear eigenvalue problem, whose particular eigenvalue characterizes the optimal value of the original control problem. In this paper we are concerned with a special class of long-run-average cost problems for discrete-time controlled Markov chains with a denumerable state space. In order to illustrate the problem briefly, let Q be a set of stochastic matrices on a countably infinite set S. For each stochastic matrix q = q(x, y) belonging to Q, we denote by ((Xn )n≥0 , (Pqx )x∈S ) the associated time-homogeneous Markov chain on S. We maximize, over all q ∈ Q, the long-run-average cost defined by [T −1 ] 1 q ∑ J(q; β) := lim sup Ex (βr(Xn ) − c(Xn , q)) , (1.1) T →∞ T n=0 ∗

Department of Physics and Mathematics, Aoyama Gakuin University, 5-10-1 Fuchinobe, Chuo-ku, Sagamihara-shi, Kanagawa 252-5258, Japan. Email: [email protected].

1

where β ≥ 0 is a real parameter, r : S → R is a reward function being nonnegative with finite support, and c : S × Q → R denotes a penalty function such that c(x, q) =

1 ∑ q(x, y) q(x, y) log , a(x) y∈S p(x, y)

x ∈ S,

(1.2)

for some positive function a = a(x) on S and an irreducible stochastic matrix p = p(x, y) on S. More precise assumptions will be made in the next section. The objective is to characterize the optimal value Λ(β) := supq∈Q J(q; β) in term of the optimality equation and to construct an optimal control q¯ ∈ argmaxq∈Q J(q; β). The method of stochastic dynamic programming (e.g. [1, 2, 16]) suggests that the optimality equation associated with (1.1) takes the form {∑ } λ + W (x) = sup q(x, y)W (y) − c(x, q) + βr(x) = 0 in S, (1.3) q∈Q

y∈S

where the unknown of (1.3) is the pair of a real constant λ and a function W = W (x) on S. Notice that (1.3) has infinitely many solutions (λ, W ), in general. We will see in later sections that Λ(β) coincides with the least value of λ ∈ R for which (1.3) admits a solution W , and that an optimal control q¯ exists and can be constructed by using the solution W of (1.3) with λ = Λ(β). The novelty of this paper lies in the study of qualitative properties with respect to β of the optimal control problem described above. More specifically, it turns out that there exists a critical value βc such that Λ(β) > 0 for all β > βc and Λ(β) = 0 for all β ≤ βc . Furthermore, the optimal trajectory, namely, the Markov chain governed by the optimal control q¯ is positive recurrent for any β > βc , while it is transient for any β < βc . This implies that several qualitative properties of the above-mentioned problem drastically change in the vicinity of the critical value βc . We shall call such phenomenon phase transition with an abuse of terminology in statistical physics. To the best of our knowledge, the analysis of phase transitions in the context of controlled Markov chains is new, although there is an extensive literature devoted to the stability of optimal trajectories (e.g., [1, 13, 16]). Notice here that the form of penalty function (1.2) is crucial to our study of phase transitions. We do not know whether our results can be extended for other types of penalty functions. This paper gives a nonlinear extension of the following asymptotic problem for Feynman-Kac type functional: [ ( T −1 )] ∑ 1 F (β) := lim sup log Ep0 exp β r(Xn ) , (1.4) T →∞ T n=0 where X = (Xn ) denotes the Markov chain on S with transition matrix p = p(x, y), and r = r(x) denotes a given potential function. Then taking into account the large 2

deviation theory (cf. [8, 12]), one can expect that F (β) coincides with the smallest real value λ for which the following stationary equation admits a positive solution ϕ = ϕ(x): ∑ eβr(x) p(x, y)ϕ(y) = eλ ϕ(x), x ∈ S. (1.5) y∈S

Remark that ϕ = ϕ(x) is a positive solution of (1.5) for some λ ∈ R if and only if W (x) := log ϕ(x) satisfies (1.3) with the same λ and with a(x) ≡ 1 in (1.2) (see Section 3 for details). In this sense, optimality equation (1.3) is a generalization of (1.5). A caution to be mentioned here is that the logarithmic transform taken above is feasible only when a(x) in (1.2) is constant. Hence, our maximization problem cannot be reduced to a linear one. We refer to [4, 5] for the asymptotic problem (1.4)-(1.5) with a finite state space (see also [9] for a related study of (1.3) with a finite state space). Phase transitions for functionals of type (1.4) have been investigated by [6] in connection with the discrete homopolymer model in statistical physics. More precisely, let X = (Xt )t≥0 be a continuous-time symmetric simple random walk on the d-dimensional lattice Zd whose sample path is identified with a long polymer chain. Let Fˆ (β) be the so-called free energy of the polymer chain X = (Xt )t≥0 defined by [ ( ∫ T )] 1 Fˆ (β) := lim log E0 exp β 1{0} (Xt )dt , (1.6) T →∞ T 0 where 1{0} denotes the characteristic function of the set {0} regarded as an attracting potential. Note that parameter β stands for the reciprocal temperature in the physical literature. Then, it is proved in [6] that there exists a critical value βc such that Fˆ (β) > 0 for β > βc and Fˆ (β) = 0 for β ≤ βc . They also establish the phase transition of globular/diffusive states of long polymer chains taken place at β = βc . Under our framework, this corresponds, more or less, to the recurrence/transience dichotomy of optimally controlled Markov chains. Thus, this paper covers some analogous results of [6] since (1.4) is the discrete-time counterpart of (1.6) (see also Remark 2.5 below). Notice that our approach is completely different from [6], where the Fourier analysis is the main tool. Contrary to [6], it is not appropriate for our problem to use the Fourier analysis since, on the one hand, optimality equation (1.3) is essentially nonlinear, and, on the other hand, no periodic structure is imposed on the state space S. It turns out that potential theoretic arguments based on the maximum principle are relevant throughout the paper (c.f. [13]). Before closing this introductory section, we mention that the same type of phase transitions have been observed in [14, 15] for continuous-time controlled diffusions in Rd (see also [7] for a continuous version of [6]). However, we emphasize that the present paper is not a trivial translation from continuous regime into discrete one. 3

Several arguments used in the continuous model are not applicable since they depend essentially on the differential calculus. Typically, Lemma 3.4 of Section 3, which plays a crucial role, is peculiar to the discrete model and cannot be derived from [7, 14, 15]. Also, the approximation procedure taken in Section 4, which guarantees the existence of a bounded-from-above solution to the optimality equation (1.3), is a new ingredient of this paper. It is worth mentioning here that studying phase transitions in the discrete model has some advantages. First of all, we can dispense with regularity issues, i.e., differentiability of solutions to the optimality equation, which often cause several technical difficulties. Secondly, we are able to establish a refined estimate on the critical value βc which has not been obtained in the continuous case (see Theorem 6.6). The rest of this paper is organized as follows. In the next section, we state our assumptions and main results precisely. Sections 3 collects some fundamental facts and key lemmas that will be used in the sequel. Sections 4, 5, and 6 are devoted to the proof of our main results Theorems 2.1, 2.2, and 2.3, respectively. An appendix is also provided for the complement.

2

Assumptions and main results

In this section, we state our assumptions and main results precisely. Let S be a countably infinite set with a fixed reference point x0 ∈ S. For each stochastic matrix ∑ q = q(x, y) on S, we set q n (x, y) := z∈S q n−1 (x, z)q(z, y) for n ≥ 1 and q 0 (x, y) := δxy , ∑ where δxy denotes Kronecker’s delta. We also set (qf )(x) := y∈S q(x, y)f (y) for f : S → R. Let p = p(x, y) be a given stochastic matrix on S which is fixed throughout. Set B := {(x, y) ∈ S × S | p(x, y) > 0}. Then (S, B) forms a directed infinite graph with graph distance d(x, y) := inf{n ≥ 0 | pn (x, y) > 0}, where inf ∅ := +∞. Notice that d(x, y) ̸= d(y, x), in general. We denote by d(x) := d(x0 , x) the distance function from the reference point x0 . Throughout the paper, we impose the following conditions: (A1) (a) The stochastic matrix p = p(x, y) is irreducible on S. (b) There exists an M > 0 such that d(y, x) ≤ M for all (x, y) ∈ B. (c) There exists an ε0 > 0 such that p(x, y) ≥ ε0 for all (x, y) ∈ B. 1 (d) lim (pn d)(x0 ) = 0. n→∞ n In view of property (c), we observe that the degree of x ∈ S is finite uniformly in x. Indeed, if we set Bx := {y ∈ S | p(x, y) > 0}, then the number of elements in Bx , denoted by |Bx |, is less than 1/ε0 for all x ∈ S. We remark that property (d) represents a sort of centering condition for the Markov chain associated with p. For instance, let 4

S = Zd with x0 = 0, and let X = (Xn ) be a random walk on Zd such that E0 [X12 ] < ∞. Then, since Var[Xn ] = O(n) as n → ∞, we easily see that (d) holds true if and only if E0 [X1 ] = 0. We now state our standing assumptions on the reward function r : S → R and the penalty function c = c(x, q). Let Q denote the totality of stochastic matrices q = q(x, y) on S such that q(x, y) = 0 for all (x, y) ̸∈ B, i.e., q(x, · ) is absolutely continuous with respect to p(x, · ) for any x ∈ S. We denote by M(Bx ) the set of probability measures on Bx equipped with the relative topology induced from the Euclidean space Rn with n = |Bx |. For each x ∈ S, we define the relative entropy function I = I(x, · ) on M(Bx ) by I(x, µ) :=



µ(y) log

y∈Bx

µ(y) , p(x, y)

µ = (µ(y)) ∈ M(Bx ),

(2.1)

where we use the convention 0 log 0 = 0. Remark that, for any x ∈ S, I(x, · ) is nonnegative and strictly convex on M(Bx ). Furthermore, under (A1), I(x, · ) is bounded on M(Bx ), uniformly in x ∈ S. Throughout the paper, we impose the following assumptions: (A2) r = r(x) is nonnegative and finitely supported, that is, there exist some finite ∑l x1 , . . . , xl ∈ S and α1 , . . . , αl > 0 such that r(x) = i=1 αi 1{xi } (x) for all x ∈ S. Hereafter, we set supp r := {x1 , . . . , xl }, the support of r. (A3) There exists a function a : S → R with κ1 ≤ a(x) ≤ κ2 in S for some κ1 , κ2 > 0 such that c(x, q) = a(x)−1 I(x, q(x, · )) for all (x, q) ∈ S × Q, where I is the relative entropy function defined by (2.1). By the definition of c, we observe that c(x, q) = 0 if and only if q = p. Here, we prefer to put a(x)−1 in (A3), rather than a(x), for the simplicity of presentation in later discussions. The role of c(x, q) is now clear: it assigns, for each x ∈ S, some penalty according to the distance between q(x, · ) and p(x, · ) measured by the relative entropy. Notice that no penalty is incurred if (and only if) q = p. Now, for each q ∈ Q, let ((Xn )n≥0 , (Pqx )x∈S ) denote the discrete-time Markov chain on S with time-homogeneous transition probability q = q(x, y). In this paper, we use the wording “q-chain” to denote the Markov chain associated with q ∈ Q. Given a real parameter β ≥ 0, we consider the following maximization problem: [T −1 ] 1 q ∑ Maximize J(q; β) := lim sup Ex0 (βr(Xn ) − c(Xn , q)) , T →∞ T (2.2) n=0 subject to q ∈ Q, where Eqx0 denotes the expectation with respect to Pqx0 . The optimal value is denoted 5

by Λ(β) := sup J(q; β).

(2.3)

q∈Q

We next introduce the optimality equation associated with (2.2). For each f : S → R, we define the function H[f ] : S → R by H[f ](x) := sup{(qf )(x) − c(x, q)},

x ∈ S,

(2.4)

q∈Q

and consider the following equation: λ + W (x) = H[W ](x) + βr(x) in S,

W (x0 ) = 0,

(2.5)

where the unknown is the pair of a real constant λ and a function W = W (x) on S. Note that the constraint W (x0 ) = 0 in (2.5) is imposed to avoid the ambiguity of additive constants with respect to W . Set λ∗ (β) := inf{λ ∈ R | There exists a supersolution W of (2.5)}.

(2.6)

Here and in the following, a function W : S → R, or a pair (λ, W ), is called a supersolution (resp. subsolution) of (2.5) if λ + W (x) ≥ H[W ](x) + βr(x) in S,

W (x0 ) ≥ 0

(resp. ≤).

(2.7)

As usual, W is said to be a solution of (2.5) if it is both sub- and supersolutions. Then we have the following result on the solvability of (2.5). Theorem 2.1. Let (A1)-(A3) hold. Then, for any β ≥ 0, there exists a solution W of (2.5) with λ = λ∗ (β) such that supS W < ∞. Moreover, there exists a βc ≥ 0 such that λ∗ (β) = 0 for β ∈ [0, βc ] and λ∗ (β) > 0 for β ∈ (βc , ∞). We are now in position to state our main results. The first result is concerned with a characterization of the optimal value Λ = Λ(β) defined by (2.3). For each integer t ≥ 0, we define W (t, · ) : S → R inductively by W (0, x) ≡ 0 and W (t + 1, x) = H[W (t, · )](x) + βr(x),

t ≥ 0,

x ∈ S,

(2.8)

and set

W (t, x0 ) . (2.9) t t→∞ Furthermore, let q¯ = q¯(x, y) be the stochastic matrix on S that maximizes the righthand side of (2.4) for all x ∈ S, that is to say, Λ∗ (β) := lim sup

H[f ](x) = (¯ q f )(x) − c(x, q¯),

x ∈ S.

(2.10)

Remark that, for any f : S → R, there exists a unique q¯ ∈ Q such that (2.10) holds, and that q¯ is irreducible on S. We prove this fact in Section 3. In what follows, q¯ denotes this unique maximizer. We often use the notation q¯ = argmax H[f ] to emphasize the dependence of q¯ on f . Then, our first main result can be stated as follows. 6

Theorem 2.2. Assume (A1)-(A3). Let Λ = Λ(β), λ∗ = λ∗ (β), and Λ∗ = Λ∗ (β) be defined by (2.3), (2.6), and (2.9), respectively. Then λ∗ (β) = Λ(β) = Λ∗ (β) for all β ≥ 0. Moreover, Let W be the solution of (2.5) given in Theorem 2.1, and let q¯ = argmax H[W ]. Then Λ(β) = J(¯ q ; β). In particular, q¯ is an optimal control for (2.2). Our second main result concerns a characterization of the phase transition arising in (2.2). Let βc be the constant given in Theorem 2.1. Then it turns out that the positivity of βc is equivalent to the transience of the p-chain. Furthermore, the recurrence and transience of the Markov chain associated with the stochastic matrix q¯ = argmax H[W ] characterizes the critical value βc . More precisely, the following theorem holds. Theorem 2.3. Assume (A1)-(A3). Let βc ≥ 0 be the constant given in Theorem 2.1. Then, βc > 0 if and only if the p-chain is transient. Moreover, for any solution W of (2.5) with λ = λ∗ (β), the Markov chain associated with q¯ = argmax H[W ] is transient for β < βc and positive recurrent for β > βc . It is interesting to obtain some information on the explicit value of βc when it is positive. We give in Section 6 certain upper and lower bounds of βc in terms of p, r, and c in (A1)-(A3). Unfortunately, in our general setting, we do not know any exact value of βc > 0 unless a = a(x) in (A3) is constant. See Section 6 for details. Remark 2.4. As a direct consequence of Theorem 2.2, one can verify that the value F (β) defined by (1.4) coincides with the optimal value Λ(β) for (2.2) under the condition that a(x) ≡ 1 in (A3). In fact, let us set ϕ(0, x) := 1 for all x ∈ S and [ ( t−1 )] ∑ ϕ(t, x) := Epx exp β r(Xn ) , t ≥ 1, x ∈ S. n=0

Then, in view of the Markov property of the p-chain, together with the explicit formula for H[f ] given in Lemma 3.3 of Section 3, one can see that the function W (t, x) := log ϕ(t, x) satisfies (2.8). In particular, F (β) = Λ∗ (β) provided that a(x) ≡ 1 in (A3). Remark 2.5. Let Fˆ (β) be defined by (1.6) and βc its critical value. Then, it is proved in [6, Theorem 3.1] that βc = 0 for d = 1, 2 and βc > 0 for d ≥ 3. This fact is consistent with Theorem 2.3 since a symmetric simple random walk on Zd is recurrent for d = 1, 2 and transient for d ≥ 3. Remark 2.6. It is possible to replace the control space Q by a larger one. For instance, let Q be the set of sequences q = (qn )n≥0 such that q0 ∈ Q and qn : S n → Q for all n ≥ 0. Let ((Xn )n≥0 , (Pqx )x∈S ) be such that Pqx (X0 = y) = δxy and Pqx (Xn+1 = yn+1 | X0 = y0 , . . . , Xn−1 = yn−1 , Xn = yn ) = qn (y0 , . . . , yn−1 ; yn , yn+1 ) 7

for any n ≥ 0 and x, y, y0 , . . . , yn+1 ∈ S. Notice that X = (Xn )n≥0 is neither timehomogeneous nor Markovian, in general. Then, for any q ∈ Q, one can define the ¯ cost functional (1.1) with a minor modification. If we denote by Λ(β) its supremum ¯ over all q ∈ Q, then Λ(β) ≤ Λ(β) in view of the natural inclusion Q ⊂ Q. It is also ¯ standard to verify (e.g., [16, 17]) that Λ(β) ≤ Λ∗ (β), where Λ∗ (β) is defined by (2.9). ¯ In particular, from Theorem 2.2, we obtain Λ(β) = λ∗ (β) and there exists an optimal control q¯ in the original control space Q. With this observation in mind, we formulate our maximization problem (2.2) with the smaller control space Q.

3

Key lemmas

In this section, we collect some facts and propositions that will be used repeatedly in the following sections. In the rest of this paper, we assume (A1)-(A3) and use them without mentioning. We begin with stating weak and strong maximum principles for superharmonic functions associated with a given stochastic matrix q = q(x, y). Lemma 3.1. Let q = q(x, y) be an irreducible stochastic matrix on S, and let W be a function on S. (i) Suppose that there exist an α ∈ (0, 1) and a finite set D ⊂ S such that α(qW ) ≤ W in D and W ≥ 0 in S \ D. Then W ≥ 0 in S. (ii) Suppose that qW ≤ W in S and W attains its minimum at some point in S. Then W is constant in S. Proof. We first show (i). Suppose contrarily that minD W = W (x) < 0 for some x ∈ D. Then W (y) − W (x) ≥ 0 for all y ∈ S. Therefore, we have ∑ 0≤α q(x, y)(W (y) − W (x)) = α(qW )(x) − αW (x) ≤ (1 − α)W (x) < 0, y∈S

which is a contradiction. Hence W ≥ 0 in D. We next prove (ii). Let x ∈ S be a minimum point of W . Then, we have 0≤



q(x, y)(W (y) − W (x)) ≤ (qW )(x) − W (x) ≤ 0,

y∈S

which implies that W (y) = W (x) for all y ∈ S with q(x, y) > 0. Using this argument repeatedly and taking into account the irreducibility of q, we conclude that W (y) = W (x) for all y ∈ S. Hence, W is constant in S. We next recall the method of Lyapunov functions, which provides useful criteria for transience and recurrence of irreducible Markov chains with an infinite state space. The following is a version of it. 8

Theorem 3.2. Let q = q(x, y) be an irreducible stochastic matrix on S, V = V (x) a nonnegative function on S, and A ⊂ S a nonempty finite set. (i) Suppose that inf S\A V < minA V and qV ≤ V in S \ A. Then the q-chain is transient. (ii) Suppose that {x ∈ S | V (x) ≤ N } is finite for every N > 0, and qV ≤ V in S \ A. Then the q-chain is recurrent. (iii) Suppose that qV < ∞ in A and qV ≤ V − 1 in S \ A. Then the q-chain is positive recurrent. Proof. This theorem is well known, so that we omit to reproduce the proof. We refer, for instance, to [10, Chapter 3] or [3, Chapter 5] for a complete proof. Now, we state some fundamental properties of the function H[f ] : S → R defined by (2.4). Lemma 3.3. The function H[f ] : S → R defined by (2.4) has the explicit formula H[f ](x) =

∑ 1 log p(x, y)ea(x)f (y) , a(x) y∈S

x ∈ S.

Moreover, the stochastic matrix q¯ = q¯(x, y) defined by q¯(x, y) = ∑

p(x, y)ea(x)f (y) , a(x)f (z) z∈S p(x, z)e

x, y ∈ S,

(3.1)

is the unique element of Q satisfying (2.10), i.e., q¯ is the unique maximizer of the right-hand side of (2.4) for any x ∈ S. In particular, q¯ is irreducible on S. Proof. Suppose first that a(x) ≡ 1 in (A3), namely, c(x, q) = I(x, q(x, · )). Then, by the Lagrange multiplier method, together with the strict convexity of I(x, µ) with respect to µ, one can verify that the stochastic matrix q¯ ∈ Q defined by q¯(x, y) = ∑

p(x, y)ef (y) , f (z) z∈S p(x, z)e

x, y ∈ S,

is the unique maximizer of (2.4). One can also observe by a direct computation that H[f ](x) = (¯ q f )(x) −



q¯(x, y){log q¯(x, y) − log p(x, y)}

y∈S

= (¯ q f )(x) −



q¯(x, y){f (y) − log(pef )(x)} = log(pef )(x).

y∈S

Hence, the claim is valid if a(x) ≡ 1 in (A3).

9

We now consider the general case. Fix any x ∈ S and set fx (y) := a(x)f (y). Then 1 sup{(qfx )(x) − I(x, q(x, · ))} a(x) q∈Q 1 1 = log(pefx )(x) = log(pea(x)f ( · ) )(x), a(x) a(x)

H[f ](x) =

which is the desired formula. It is also obvious that q¯ given by (3.1) is the unique maximizer of (2.4). Hence, we have completed the proof. In view of Lemma 3.3, one can easily see that, if a(x) ≡ 1 in (A3), then ϕ = ϕ(x) is a positive solution of (1.5) for some λ ∈ R if and only if W (x) := log ϕ(x) is a solution of (1.3) with the same λ. From now on, q¯ = q¯(x, y) denotes the stochastic matrix defined by (3.1). In order to emphasize the dependence on f , we often use the notation q¯ = argmax H[f ]. The following lemma plays a key role throughout the paper. Lemma 3.4. Let (λ1 , W1 ) and (λ2 , W2 ) be, respectively, sub- and supersolutions of (2.5) in a subset A ⊂ S. Let q¯ = argmax H[W1 ], and set V := eκ1 (W2 −W1 ) , where κ1 is the constant in (A3). Then q¯V ≤ eκ1 (λ2 −λ1 ) V in A. Proof. Fix any x ∈ A. Then we have λ2 − λ1 + W2 (x) − W1 (x) ≥ H[W2 ](x) − H[W1 ](x) = 1 = log κ1

(

(pea(x)W2 )(x) (pea(x)W1 )(x)

(pea(x)W2 )(x) 1 log a(x) (pea(x)W1 )(x)

)κ1 /a(x)

.

Since q¯(x, y) = p(x, y)ea(x)W1 (y) /(pea(x)W1 )(x) is stochastic and κ1 /a(x) ≤ 1, we see in view of Jensen’s inequality that )κ1 /a(x) ( a(x)W2 )κ1 /a(x) (∑ (pe )(x) = q¯(x, y)ea(x)(W2 (y)−W1 (y)) (pea(x)W1 )(x) y∈S ∑ ≥ q¯(x, y)κ1 /a(x) eκ1 (W2 (y)−W1 (y)) y∈S





q¯(x, y)eκ1 (W2 (y)−W1 (y)) = (¯ q V )(x).

y∈S

This implies that q¯V ≤ eκ1 (λ2 −λ1 ) V in A. Hence, we have completed the proof. The rest of this section is concerned with the optimality equation (2.5). Remark here that, for any λ > λ∗ (β), there exists a supersolution W of (2.5). This fact follows directly from the definition of λ∗ (β) and the supersolution property of W . We first verify that λ∗ (β) in (2.6) is well-defined. 10

Lemma 3.5. λ∗ (β) defined by (2.6) is finite for all β ≥ 0. Proof. Fix any β ≥ 0. Since W ≡ 0 is a supersolution of (2.5) with λ = maxS |βr|, we see that λ∗ (β) ̸= +∞. To verify that λ∗ (β) ̸= −∞, fix any supersolution (λ, W ) of (2.5). Since p is irreducible, there exists a finite sequence y0 , y1 , . . . , yn ∈ S such that y0 = yn = x0 and (yi−1 , yi ) ∈ B for all 1 ≤ i ≤ n. We now choose a q ∈ Q such that q(yi−1 , · ) = δyi for all 1 ≤ i ≤ n, where δyi denotes the unit probability measure concentrated on {yi }. Then, λ + W (yi−1 ) ≥ W (yi ) − c(yi−1 , q) + βr(yi−1 ),

1 ≤ i ≤ n.

Summing up the above inequalities with respect to i, we obtain nλ ≥ −n supS×Q |c|, so that λ ≥ − supS×Q |c|. Since λ > λ∗ (β) is arbitrary, we conclude that λ∗ (β) ≥ − supS×Q |c| > −∞. Hence, the proof is complete. The following a priori estimate for supersolutions of (2.5) will be used repeatedly. Proposition 3.6. Let (λ, W ) be a supersolution of (2.5). Then, for any (x, y) ∈ B, W (y) − W (x) ≤ λ + sup |c|,

W (x) − W (y) ≤ M (λ + sup |c|),

S×Q

S×Q

where M > 0 is the constant in (A1). Proof. Fix any (x, y) ∈ B, and choose any q ∈ Q such that q(x, · ) = δy . Then, by the supersolution property of W , we have λ + W (x) ≥ W (y) − c(x, q) + βr(x) ≥ W (y) − sup |c|,

(3.2)

S×Q

which implies the first inequality. To prove the second inequality, we choose a finite sequence y0 , y1 , . . . , yn ∈ S with n ≤ M such that y0 = y, yn = x, and (yi−1 , yi ) ∈ B for all 1 ≤ i ≤ n. Using (3.2) repeatedly, we have nλ + W (y) ≥ W (x) − n supS×Q |c|. Hence, the second inequality is valid, and we have completed the proof. We close this section with two useful propositions that follow from Lemma 3.4 in combination with Theorem 3.2. Proposition 3.7. Let (λ1 , W1 ) be a solution of (2.5) such that λ1 > 0 and supS W1 < ∞. Then, the following (i)-(iii) hold. (i) The Markov chain on S associated with q¯ = argmax H[W1 ] is positive recurrent. (ii) λ1 = λ∗ (β), where λ∗ (β) is the constant defined by (2.6). (iii) W1 is the unique solution of (2.5) with λ = λ∗ (β).

11

Proof. We first prove (i). Fix any x ∈ S \ supp r, where supp r stands for the support of r = r(x) in (A2). Then, we see that λ1 + W1 (x) = H[W1 ](x) = (¯ q W1 )(x) − c(x, q¯) ≤ (¯ q W1 )(x). This implies that the function V := −W1 satisfies q¯V ≤ V − λ1 in S \ supp r. Since λ1 > 0 and inf S V > −∞ by assumption, we can apply Theorem 3.2 (iii) to conclude that the q¯-chain is positive recurrent. We next prove (ii). Observe from the definition of λ∗ (β) that λ1 ≥ λ∗ (β). In order to prove the claim, we argue by contradiction assuming that λ1 > λ∗ (β). Fix an ε > 0 so small that λ1 > λ1 − ε > λ∗ (β). Let Wε be a supersolution of (2.5) with λ = λ1 − ε. Then, in view of Lemma 3.4, we see that V := eκ1 (Wε −W1 ) satisfies q¯V ≤ e−ε V < V in S. We now claim that V has a minimum in S. Indeed, if this is not true, then there exists a finite set A ⊂ S and an x ∈ S \ A such that V (x) < minA V . This yields from Theorem 3.2 (i) that the q¯-chain is transient, a contradiction. Hence, V has a minimum in S. Applying Lemma 3.1 (ii), we conclude that V , and therefore, Wε − W1 is constant in S. But this is impossible. Hence, λ1 = λ∗ (β). We finally prove (iii). Let W be another solution of (2.5) with λ = λ∗ (β). Then, using Lemma 3.4 again, we see that q¯V ≤ V in S for V := eκ1 (W −W1 ) . By exactly the same argument as in the proof of (ii), we conclude that W − W1 is constant in S. Since W (x0 ) = W1 (x0 ) = 0, we obtain W = W1 in S. Hence, we have completed the proof. Proposition 3.8. Let (λ1 , W1 ) be a solution of (2.5), and let q¯ = argmax H[W1 ]. Suppose that there exists a supersolution W of (2.5) with λ = λ1 which is strict at some point in S. Then, the q¯-chain is transient. Proof. By virtue of Lemma 3.4, we see that V := eκ1 (W −W1 ) satisfies q¯V ≤ V in S. Furthermore, V does not attain a minimum in S. Otherwise, V should be constant in S by Lemma 3.1 (ii). But this is a contradiction since W ̸≡ W1 . Thus, one can apply Theorem 3.2 (i) to conclude that the q¯-chain is transient. Hence, we have completed the proof.

4

Proof of Theorem 2.1

This section is devoted to the proof of Theorem 2.1. Let us consider optimality equation (2.5) with λ = λ∗ (β), namely, λ∗ (β) + W = H[W ] + βr

12

in S,

W (x0 ) = 0.

(4.1)

Note that (4.1) admits a supersolution W . This fact can be verified by the standard approximation procedure taking into account Proposition 3.6. In particular, the righthand side of (2.6) is, in fact, the minimum. Before discussing the solvability of (4.1), we first prove some qualitative properties of λ∗ (β) with respect to β, from which we deduce the second claim of Theorem 2.1. Proposition 4.1. The value λ∗ (β) is nonnegative for all β ≥ 0. Moreover, if the p-chain is positive recurrent, then λ∗ (β) > 0 for all β > 0. Proof. Fix any β ≥ 0. Let W be a supersolution of (4.1). Then, λ∗ (β) + W (x) ≥ (pW )(x) + βr(x) for all x ∈ S. This implies that ∗

nλ (β) + W (x0 ) ≥ (p W )(x0 ) + β n

n−1 ∑

(pk r)(x0 ),

n ≥ 1.

k=0

Furthermore, in view of Proposition 3.6, we see that W (x) ≥ W (x0 ) − M (λ∗ (β) + sup |c|)d(x),

x ∈ S.

S×Q

Using this, together with property (d) in (A1), we obtain 1∑ k 1 (p r)(x0 ) ≥ 0. λ (β) ≥ −M (λ (β) + sup |c|) lim (pn d)(x0 ) + β lim inf n→∞ n n→∞ n S×Q k=0 n−1





(4.2)

Hence, the first claim is valid. Suppose next that the p-chain is positive recurrent, and let π = π(y) be the associated invariant probability measure on S. Then, applying ∑ the ergodic theorem to the second limit in (4.2), we obtain λ∗ (β) ≥ β y∈S π(y)r(y). Hence, λ∗ (β) > 0 for any β > 0, and we have completed the proof. Proposition 4.2. The function β 7→ λ∗ (β) is nondecreasing, convex, and not identically zero in [0, ∞). Moreover, λ∗ (0) = 0. Proof. Fix any 0 ≤ β1 < β2 . Let W1 and W2 be supersolutions of (4.1) with β = β1 and β = β2 , respectively. Since W2 is also a supersolution of (2.5) with β = β1 and λ = λ∗ (β2 ), we have λ∗ (β2 ) ≥ λ∗ (β1 ). In particular, λ∗ (β) is nondecreasing with respect to β. In order to verify the convexity of β 7→ λ∗ (β), fix any δ ∈ (0, 1). Then, by the convexity of W 7→ H[W ], we see that δW1 + (1 − δ)W2 is a supersolution of (2.5) with β = δβ1 + (1 − δ)β2 and λ = δλ∗ (β1 ) + (1 − δ)λ∗ (β2 ). Thus, δλ∗ (β1 ) + (1 − δ)λ∗ (β2 ) ≥ λ∗ (δβ1 + (1 − δ)β2 ), which implies the convexity of β 7→ λ∗ (β). We next prove that λ∗ (β) is not identically zero. Let W be a supersolution of (4.1). Fix any x ∈ supp r and choose a sequence y0 , y1 , . . . , yn ∈ S with n ≤ M + 1 such that 13

y0 = yn = x and (yi−1 , yi ) ∈ B for all 1 ≤ i ≤ n. Then, similarly as in the proof of Lemma 3.5, we see that (M +1)(λ∗ (β)+supS×Q |c|) ≥ βr(x). In particular, λ∗ (β) → ∞ as β → ∞. Hence, λ∗ (β) is not identically zero. We finally show that λ∗ (0) = 0. Since W ≡ 0 satisfies (2.5) with β = 0 and λ = 0, we observe that λ∗ (0) ≤ 0. We also observe from Proposition 4.1 that λ∗ (0) ≥ 0. Hence, the proof is complete. From Proposition 4.2, one can define the critical value βc ≥ 0 by βc := max{β ≥ 0 | λ∗ (β) = 0} = inf{β ≥ 0 | λ∗ (β) > 0}. By the very definition of βc , we see that λ∗ (β) = 0 for β ∈ [0, βc ] and λ∗ (β) > 0 for β ∈ (βc , ∞). Hence, the second claim of Theorem 2.1 is valid. In the rest of this section, we prove the first claim of Theorem 2.1. More precisely, we construct a solution W of (4.1) such that maxS W < ∞. To this end, we use an approximation procedure. Set SN := {x ∈ S | d(x) ≤ N } and ∂SN := {x ∈ S | d(x) = N } for N ≥ 1. Fix an N0 so large that N0 > 2M and supp r ⊂ SN0 , where M is the constant in (A1). In what follows, as far as the finite state space SN is concerned, N stands for an integer satisfying N ≥ N0 . Furthermore, for each x ∈ ∂SN , we assign a point yN (x) ∈ SN −M such that d(yN (x), x) = M . Now, we define the stochastic matrix pN = pN (x, y) on SN by  p(x, y) if x ∈ SN −1 , pN (x, y) := (4.3) δy (x) (y) if x ∈ ∂SN . N

The set ∂SN is regarded as a reflecting boundary of SN . We claim that pN is irreducible on SN . Indeed, fix any x, y ∈ SN . If x ∈ SN −M , then there exist some finite y0 , y1 , . . . , yn ∈ SN with n ≤ M (N −M ) such that y0 = x, yn = x0 , and pN (yi−1 , yi ) > 0 for all 1 ≤ i ≤ n. We then choose yn+1 , . . . , yn+l ∈ SN with l = d(x0 , y) and yn+l = y such that pN (yn+i−1 , yn+i ) > 0 for all 1 ≤ i ≤ l. On the other hand, if x ∈ SN \ SN −M , then there exist some y0 , y1 , . . . , yn ∈ SN with n ≤ M such that y0 = x, yn ∈ ∂SN , and pN (yi−1 , yi ) > 0 for all 1 ≤ i ≤ n. We then choose, as in the previous case, some finite l′ and yn+1 , . . . yn+l′ ∈ SN such that yn+1 = yN (x), yn+l′ = y, and pN (yn+i−1 , yn+i ) > 0 for all 1 ≤ i ≤ l′ . Hence, pN is irreducible on SN . In particular, it generates a positive recurrent Markov chain on SN . Let QN be the set of stochastic matrices q = q(x, y) on SN such that q(x, y) = 0 for any x, y ∈ SN with pN (x, y) = 0. Note that q ∈ QN yields q(x, · ) = pN (x, · ) = δyN (x) for any x ∈ ∂SN . We also define the function cN = cN (x, q) on SN × QN by cN (x, q) := c(x, q) for x ∈ SN −1 and cN (x, q) := 0 for x ∈ ∂SN . Set HN [f ](x) := sup {(qf )(x) − cN (x, q)}, q∈QN

14

x ∈ SN .

(4.4)

Observe that HN [f ] = H[f ] in SN −1 and HN [f ] = f (yN ( · )) on ∂SN . One can also see that there exists a unique maximizer of (4.4), which we denote by q¯ := argmax HN [f ]. Note that q¯ is irreducible on SN . We now consider the approximating equation λ + W (x) = HN [W ](x) + βr(x) in SN ,

W (x0 ) = 0.

(4.5)

Then the following proposition holds. Proposition 4.3. For any N ≥ N0 , there exists a unique pair (λN , WN ) such that (4.5) holds. Moreover, set K0 := supS×Q |c| + maxS |βr|. Then 0 < λN ≤ K0 and |WN (x)| ≤ 2M K0 d(x) for all x ∈ SN −1 . Proof. Existence and uniqueness of solution to (4.5), as well as the upper bound of λN , follow from Theorem A.3 in the appendix. The positivity of λN can be verified similarly as in the proof of Proposition 4.1 noting that the pN -chain is positive recurrent. Furthermore, since pN (x, · ) = p(x, · ) and HN [WN ](x) = H[WN ](x) for all x ∈ SN −1 , one can use Proposition 3.6 repeatedly to obtain the bound of WN . Hence, we have completed the proof. We next show that WN is bounded above on SN uniformly in N . To this end, we first observe the following estimates. Lemma 4.4. For any N ≥ N0 and γ ≥ 0, the function U (x) := −γd(x) satisfies HN [U ](x) − U (x) ≤ γM,

x ∈ SN .

Proof. Let q¯ = argmax HN [U ] ∈ QN , and fix any x ∈ SN . Then, in view of the triangle inequality d(x0 , x) ≤ d(x0 , y) + d(y, x), we see that ∑ ∑ q¯(x, y)d(y, x). q¯(x, y)(U (y) − U (x)) ≤ γ HN [U ](x) − U (x) ≤ y∈SN

y∈SN

Since d(y, x) ≤ M for any y ∈ SN with pN (x, y) > 0, we conclude that ∑ ∑ HN [U ](x) − U (x) ≤ γ q¯(x, y)d(y, x) ≤ γM q¯(x, y) = γM. y∈SN

y∈SN

Hence, we have completed the proof. Lemma 4.5. For any N ≥ N0 , the solution (λN , WN ) of (4.5) satisfies WN (x) ≤ −

λN λN d(x) + (2M K0 + )C0 , 2M 2M

x ∈ SN ,

where K0 := supS×Q |c| + maxS |βr|, and C0 := maxsupp r d(x0 , · ). 15

(4.6)

Proof. Set γ := λN /2M > 0, and denote the right-hand side of (4.6) by U = U (x). Then, in view of the estimate of WN in Proposition 4.3, we have WN (x) ≤ U (x) for any x ∈ supp r. Furthermore, since γM < λN , we see from Lemma 4.4 that U is a supersolution of λN + W = HN [W ] in SN \ supp r. Observing that WN is a solution to the same equation, we have U − WN ≥ HN [U ] − HN [WN ] ≥ q¯(U − WN ) in SN \ supp r, where q¯ = argmax HN [WN ]. Since q¯ is irreducible on SN , one can apply Theorem A.1 (ii) in the appendix, the comparison principle, to conclude that U ≥ WN in SN . Hence, we have completed the proof. We are now ready to prove the first claim of Theorem 2.1, which is a direct consequence of the following proposition. Proposition 4.6. There exists a solution W of (4.1) such that supS W < ∞. Moreover, the solution of (4.1) is unique provided λ∗ (β) > 0. In this case, W satisfies W ≤ −γd + C in S for some γ > 0 and C > 0, and the Markov chain associated with q¯ = argmax H[W ] is positive recurrent. Proof. We first construct a solution W of (4.1) such that supS W < ∞. Let (λN , WN ) be the unique solution of (4.5). Then, by virtue of Proposition 4.3, the sequence {λN } is bounded, and the family {WN } is bounded locally uniformly in S. In particular, there exist a constant λ∞ ≥ 0, a function W : S → R, and a sequence {Nk } with Nk → ∞ as k → ∞ such that λNk → λ∞ and WNk (x) → W (x) as k → ∞ for all x ∈ S. Letting k → ∞ in (4.5) and (4.6) with N = Nk , we see that (λ∞ , W ) is a solution of (2.5) satisfying supS W < ∞. We can also see in view of (2.6) and Proposition 4.1 that λ∞ ≥ λ∗ (β) ≥ 0. We now claim that λ∞ = λ∗ (β). If λ∞ = 0, then 0 = λ∞ ≥ λ∗ (β) ≥ 0, so that the claim is obvious. If λ∞ > 0, then we can apply Proposition 3.7 (ii) to conclude that λ∞ = λ∗ (β). Hence, we have constructed the desired solution. The uniqueness of solution to (4.1) for λ∗ (β) > 0 follows from Proposition 3.7 (iii). The upper bound of the solution can be deduced from (4.6). It is also easy to see from Proposition 3.7 (i) that the Markov chain associated with q¯ = argmax H[W ] is positive recurrent. Hence, we have complete the proof.

5

Proof of Theorem 2.2

In this section we prove Theorem 2.2. The proof is divided into several steps. We first observe the following. 16

Proposition 5.1. Let W be a solution of (4.1) such that supS W < ∞, and let q¯ = argmax H[W ]. Then λ∗ (β) ≤ J(¯ q ; β) ≤ Λ(β) ≤ Λ∗ (β). Proof. In view of (4.1) and the Markov property of the q¯-chain, we see that [ n−1 ] ∑ λ∗ (β)n = λ∗ (β)n + W (x0 ) = Eqx¯0 (βr(Xk ) − c(Xk , q¯)) + W (Xn ) k=0

[ n−1 ] ∑ (βr(Xk ) − c(Xk , q¯)) + sup W ≤ Eqx¯0 S

k=0

for any n ≥ 1. Dividing both sides by n and letting n → ∞, we obtain [ n−1 ] ∑ 1 λ∗ (β) ≤ lim sup Eqx¯0 (βr(Xk ) − c(Xk , q¯)) = J(¯ q ; β) ≤ Λ(β). n→∞ n k=0 It is also easy to see from recursive equation (2.8) and W (0, · ) = 0 that, for any q ∈ Q, [ t−1 ] ∑ W (t, x0 ) ≥ Eqx0 (βr(Xn ) − c(Xn , q)) , t ≥ 1, x ∈ S. (5.1) n=0

Thus, dividing both sides by t and letting t → ∞, and then taking the supremum over all q ∈ Q, we conclude that Λ∗ (β) ≥ Λ(β). Hence, we have completed the proof. The rest of this section is devoted to establishing the inequality Λ∗ (β) ≤ λ∗ (β), which completes the proof of Theorem 2.2. It is obvious from the definition of W (t, · ) that Λ∗ (β) is nondecreasing with respect to β and Λ∗ (0) = 0. Thus, in order to prove Theorem 2.2, it suffices to show that Λ∗ (β) ≤ λ∗ (β) for any β with λ∗ (β) > 0. The key to the proof lies in the construction of a suitable supersolution V = V (t, x) of (2.8) with V (0, · ) = 0. To this end, we first construct a sequence {Wn } of functions on S which approximates the solution W of (4.1) appropriately. More specifically, let √ ϕ(x) := − d(x), and consider the following equation with discount factor α ∈ (0, 1): V (x) = H[αV ](x) + βr(x) + (1 − α)ϕ(x) in S.

(5.2)

Note that ϕ satisfies the following properties. √ Lemma 5.2. Let ϕ(x) := − d(x). Then, sup ϕ ≤ 0, S

lim ϕ(x) = −∞,

d(x)→∞

sup |ϕ(y) − ϕ(x)| < ∞.

(5.3)

(x,y)∈B

Moreover, for any ε > 0, there exists an Nε ≥ N0 such that H[ϕ] − ϕ ≤ ε in S \ SNε ,

(5.4)

where N0 is the constant, fixed in Section 4, such that N0 > 2M and supp r ⊂ SN0 . 17

Proof. We only verify (5.4) since (5.3) is obvious. Let q¯ = argmax H[ϕ]. Then, for any N ≥ N0 and x ∈ S \ SN , we have ∑ √ √ H[ϕ](x) − ϕ(x) ≤ q¯(x, y)( d(x) − d(y)) y∈S

∑ q¯(x, y)(d(x) − d(y)) M M √ √ = ≤ √ ≤ √ . 2 N − M d(x) + d(y) 2 d(x) − M y∈S Taking N = Nε so large that the right-hand side is less than ε, we obtain (5.4). Hence, we have completed the proof. We next verify the solvability of (5.2). √ Proposition 5.3. Let ϕ(x) := − d(x). Then, for any α ∈ (0, 1), there exists a solution V = Vα of (5.2) such that, for all (x, y) ∈ B, (1 − α)|Vα (x) − ϕ(x)| ≤ C,

α|Vα (x) − Vα (y)| ≤ C,

(5.5)

where C > 0 is a constant not depending on α. Proof. We define pN = pN (x, y) and HN [f ] : SN → R by (4.3) and (4.4), respectively. We first consider the approximating equation in finite state space SN : V = HN [αV ] + βr + (1 − α)ϕ in SN .

(5.6)

Then, in view of Theorem A.2 in the appendix, there exists a unique solution V = VN of (5.6). Let C1 := sup(x,y)∈B |ϕ(x) − ϕ(y)| < ∞ and q¯ ∈ argmax HN [ϕ]. Then, for any x ∈ SN , we have ∑ q¯(x, y)(ϕ(y) − ϕ(x)) ≤ C1 , HN [ϕ] − ϕ ≤ y∈SN

HN [ϕ] − ϕ ≥



pN (x, y)(ϕ(y) − ϕ(x)) ≥ −C1 .

y∈SN

Hence, supSN |HN [ϕ] − ϕ| ≤ C1 . We now set C2 := maxS |βr| + C1 , and define V1 , V2 : S → R by V1 := ϕ − C2 /(1 − α) and V2 := ϕ + C2 /(1 − α), respectively. Then, it is not difficult to verify that V1 and V2 are sub- and supersolutions of (5.6), respectively. Applying Theorem A.1 (i) in the appendix, we conclude that V1 ≤ VN ≤ V2 in SN . In particular, (1 − α)|VN (x) − ϕ(x)| ≤ C2 in SN . (5.7) Next, for any (x, y) ∈ B and N with x, y ∈ SN −1 , we choose a q ∈ QN such that q(x, · ) = δy . Then, from (5.6), we observe that VN (x) ≥ αVN (y) − cN (x, q) + βr(x) + (1 − α)ϕ(x). 18

Noting that cN (x, · ) = c(x, · ) for all x ∈ SN −1 , we have α(VN (y) − VN (x)) ≤ cN (x, q) + (1 − α) max(VN − ϕ) ≤ sup |c| + C2 =: C3 . SN

S×Q

On the other hand, for any (x, y) ∈ B, we select finite points y0 , y1 , . . . , yn ∈ S with n ≤ M such that y0 = y, yn = x, and (yi−1 , yi ) ∈ B for all 1 ≤ i ≤ n, and choose an N so large that yi ∈ SN −1 for all 1 ≤ i ≤ n. Then, we have α(VN (x) − VN (y)) =

n ∑

α(VN (yi ) − VN (yi−1 )) ≤ nC3 ≤ M C3 .

i=1

Hence, for any (x, y) ∈ B and N with x, y ∈ SN −1 , we obtain α|VN (x) − VN (y)| ≤ M C3 .

(5.8)

Taking into account (5.7) and (5.8), one can find a sequence {Nk } with Nk → ∞ as k → ∞ and a function V : S → R such that VNk → V in S as k → ∞. It is easy to see that V is a solution of (5.2) which satisfies (5.5) with C = M C3 . Hence, we have completed the proof. We now consider the following equation: λ + W = H[αW ] + βr + (1 − α)ϕ in S,

W (x0 ) = 0,

(5.9)

where the unknown is the pair (λ, W ). By virtue of Proposition 5.3, one can construct a sequence of solutions of (5.9) which approximates the unique solution of (4.1) for λ∗ (β) > 0. √ Proposition 5.4. Set ϕ(x) := − d(x). Assume that λ∗ (β) > 0. Then, there exist an increasing sequence {αn } ⊂ (0, 1) converging to 1 as n → ∞, a sequence of solutions (λn , Wn ) of (5.9) with α = αn , and a constant K > 0 such that Wn ≤ ϕ + K in S for any n ≥ 1. Moreover, as n → ∞, λn and Wn converge to λ∗ (β) and the unique solution W of (4.1), respectively. Proof. Let Vα be the solution of (5.2) constructed in Proposition 5.3. Then, in view of (5.5), there exists an increasing sequence {αn } ⊂ (0, 1) converging to 1 as n → ∞ such that λn := (1 − αn )Vαn (x0 ) and Wn := Vαn − Vαn (x0 ) converge, as n → ∞, to a constant λ∞ ∈ R and a function W∞ : S → R, respectively. Note here that (λn , Wn ) and (λ∞ , W∞ ) enjoy (5.9) and (2.5), respectively. It is also easy to see from (5.5) that there exists a C > 0 such that, for any n ≥ 1 and (x, y) ∈ B, (1 − αn )|Wn (x) − ϕ(x)| ≤ C,

19

αn |Wn (x) − Wn (y)| ≤ C.

(5.10)

Furthermore, since λ∞ ≥ λ∗ (β) > 0, we may assume, by renumbering n if necessary, that λn ≥ λ∗ (β)/2 for all n ≥ 1. We now prove that Wn ≤ ϕ + K in S for some K > 0 not depending on n. By virtue of Lemma 5.2, there exists a number N1 ≥ N0 such that H[ϕ] − ϕ ≤ λ∗ (β)/4 in S \ SN1 . Set K := supn≥1 maxSN1 (|Wn | + |ϕ|) < ∞, and define W ′ : S → R by W ′ := (1 − δ)ϕ + K, where δ ∈ (0, 1) is an arbitrarily fixed number. We prove that Wn ≤ W ′ in S for all n ≥ 1, which leads to the desired estimate after sending δ → 0. By the definition of K, it is obvious that Wn ≤ W ′ in SN1 . We can also see from (5.10) that Wn ≤ ϕ + C/(1 − αn ) in S for all n. Since ϕ(x) → −∞ as d(x) → ∞, there exists a number Nn > N1 such that Wn ≤ W ′ in S \ SNn . Hence, it remains to show that Wn ≤ W ′ in SNn \ SN1 . We first observe that W ′ is a supersolution of λn + W = H[αn W ] + (1 − αn )ϕ in the finite set SNn \ SN1 . Indeed, since H[ϕ] − ϕ ≤ λ∗ (β)/4 in S \ SN1 , we see that, for any x ∈ SNn \ SN1 , λn + W ′ (x) − H[αn W ′ ](x) − (1 − αn )ϕ(x) λ∗ (β) λ∗ (β) − αn (1 − δ)(H[ϕ](x) − ϕ(x)) + (1 − αn )(−δϕ + K) ≥ > 0. ≥ 2 4 On the other hand, Wn is a solution of the same equation in SNn \ SN1 . In particular, we have W ′ −Wn ≥ αn q¯(W ′ −Wn ) in SNn \SN1 , where q¯ = argmax H[αn Wn ]. Applying Lemma 3.1 (i), we conclude that Wn ≤ W ′ = (1 − δ)ϕ + K in S. Hence, Wn ≤ ϕ + K in S. We finally claim that λ∞ = λ∗ (β) and W∞ is the unique solution of (4.1). Letting n → ∞ in the inequality Wn ≤ ϕ + K, we observe that supS W∞ < ∞. Then, one can apply Proposition 3.7 (ii) and (iii) to conclude that λ∞ = λ∗ (β) and W∞ is the unique solution of (4.1). Hence, we have completed the proof. Remark 5.5. In Propositions 5.3 and 5.4, the choice of ϕ can be relaxed. Indeed, both of those propositions hold true for any function ϕ : S → R satisfying (5.3) and (5.4). Let {Wn } be the sequence of approximating functions given in Proposition 5.4, and let Cn > 0 be such that Wn ≥ ϕ − Cn in S for each n (see (5.10)). Fix any ε ∈ (0, 1) and define the function Vε : {0, 1, 2, . . . } × S → R by Vε (t, x) := (1 − ρt ){(1 − δ)αn Wn (x) + δd(x) + Kδ + Cn } (5.11) 1 − ρt max |βr| + (λ∗ (β) + ε)t, 1−ρ S √ where we have set Kδ := maxx∈S ( d(x) − δd(x)) > 0, and the constants δ, ρ ∈ (0, 1) and n ≥ 1 will be specified later on. It is obvious that Vε (0, · ) = 0 in S. Furthermore, √ since Wn ≥ ϕ − Cn = − d − Cn in S, we see that, for any δ and n, √ (1 − δ)αn Wn + δd + Kδ + Cn ≥ − d + δd + Kδ ≥ 0 in S. +

20

In particular, maxS |βr| < Vε (1, x) ≤ Vε (t + 1, x) and Vε (t + 1, x) ≥ Vε (t, x) + λ∗ (β) + ε for any t = 0, 1, 2, . . . , and x ∈ S. Proposition 5.6. Assume that λ∗ (β) > 0. Then, for any ε ∈ (0, 1), there exist δ, ρ ∈ (0, 1) and n ≥ 1 such that the function Vε defined by (5.11) is a supersolution of (2.8) with Vε (0, · ) = 0 in S. Proof. Since Vε (1, x) − H[Vε (0, · )](x) − βr(x) ≥ maxS |βr| − βr(x) ≥ 0 for all x ∈ S, it suffices to prove the supersolution property for all t ≥ 1. In what follows, we assume t ≥ 1. Noting the convexity of f 7→ H[f ] and H[0] = 0, we see that H[Vε (t, · )](x) − Vε (t + 1, x) ≤ H[Vε (t, · )](x) − Vε (t, x) − λ∗ (β) − ε ≤ (1 − ρt )(1 − δ)(H[αn Wn ](x) − αn Wn (x)) + δ(1 − ρt )(H[d](x) − d(x)) − λ∗ (β) − ε. Furthermore, since (λn , Wn ) is a solution of (5.9) with α = αn and Wn ≤ ϕ + K in S for some K > 0 not depending on n, we have H[αn Wn ] − αn Wn = λn − βr + (1 − αn )(Wn − ϕ) ≤ λn − βr + (1 − αn )K

in S.

From this and the estimate H[d] − d ≤ 1 in S, together with the positivity of λn , we obtain H[Vε (t, · )] + βr − Vε (t + 1, · ) ≤ (1 − ρt )(1 − δ)(λn − βr + (1 − αn )K) + δ(1 − ρt ) + βr − λ∗ (β) − ε ≤ λn − λ∗ (β) + (δ + ρ) max |βr| + (1 − αn )K + δ − ε, S

where we have used t ≥ 1 to deduce the second inequality. Choosing δ, ρ ∈ (0, 1) so small and n ≥ 1 so large that the right-hand side be less than zero, we conclude that Vε is a supersolution of (2.8). Hence, we have completed the proof. We are now in position to complete the proof of Theorem 2.2. Proof of Theorem 2.2. It remains to prove that Λ∗ (β) ≤ λ∗ (β) for any β such that λ∗ (β) > 0. Fix any ε > 0. Let W = W (t, x) be the solution of (2.8) with W (0, · ) = 0, and let Vε be the supersolution of (2.8) constructed in Proposition 5.6. Since W (0, · ) = Vε (0, · ) = 0 in S, we see in view of the monotonicity of f 7→ H[f ] and recursive equation (2.8) that W (t, x) ≤ Vε (t, x) for all t ≥ 1 and x ∈ S. Dividing both sides by t and sending t → ∞, we obtain Λ∗ (β) ≤ λ∗ (β) + ε. Since ε > 0 is arbitrary, we conclude that Λ∗ (β) ≤ λ∗ (β). Hence, the proof is complete.

21

Remark 5.7. In Sections 4 and 5, we have used two different approximations to construct a solution W of (4.1). The former approximation used in Section 4 seems unusual, but effective in finding a bounded-from-above solution of (4.1). The latter approximation, often called the vanishing discount factor method, is quite standard in the long-run-average cost problem (c.f. [13, 16]). Both of them converge to the unique solution W of (4.1) provided that λ∗ (β) > 0. However, we do not know if they converge to the same function when λ∗ (β) = 0. In particular, the latter approximation may not give a bounded-from-above solution of (4.1), in general. That is the main reason why we needed a different approximation in Section 4.

6

Proof of Theorem 2.3

The goal of this section is to prove Theorem 2.3. For this purpose, we prepare some auxiliary propositions. Proposition 6.1. Let W be a solution of (4.1), and let q¯ = argmax H[W ]. Suppose that β < βc . Then, the q¯-chain is transient. Proof. Let Wc be a solution of (4.1) with β = βc . Then, for any β < βc , Wc is a supersolution of (4.1) which is strict at any x ∈ supp r. Applying Proposition 3.8, we conclude that the q¯-chain is transient. Hence, the proof is complete. Proposition 6.2. Assume that the p-chain is transient. Then there exist a nonpositive function W0 on S and a sequence {xn } ⊂ S such that W0 (xn ) → −∞ as d(xn ) → ∞, sup(x,y)∈B |W0 (y) − W0 (x)| < ∞, and H[W0 ](x) ≤ W0 (x) in S \ supp r,

W0 = 0 in supp r.

Proof. Set w(x) := Ppx (τ < ∞) for x ∈ S, where τ is the stopping time defined by τ := inf{n ≥ 0 | Xn ∈ supp r}. Then w is the minimal nonnegative solution of pw = w

in S \ supp r,

w = 1 in supp r.

(6.1)

We claim here that w > 0 in S. Indeed, suppose contrarily that w(x) = 0 for some x ∈ S \ supp r. Then, since w ≥ 0 in S and p is irreducible, we see in view of (6.1) that there exist some y, z ∈ S with (y, z) ∈ B, y ̸∈ supp r, and z ∈ supp r, such that w(y) = ∑ 0. Using (6.1), we have 0 = w(y) = y′ ∈S p(y, y ′ )w(y ′ ) ≥ p(y, z)w(z) = p(y, z) > 0, which is a contradiction. Hence, w > 0 in S. Furthermore, we have inf S w = 0. Indeed, suppose that inf S w = c > 0. Then w ˜ := (w−c)/(1−c) is nonnegative in S and satisfies (6.1). This contradict the minimality of w. Hence, inf S w = 0. In particular, there exists a sequence {xn } ⊂ S such that w(xn ) → 0 as d(xn ) → ∞. 22

We next claim that there exist some k1 , k2 > 0 such that, for any (x, y) ∈ B, k1 w(y) ≤ w(x) ≤ k2 w(y).

(6.2)

To prove this, we fix any N ≥ N0 + M , so that supp r ⊂ SN −M by the definition of N0 (see Section 4). Then, it is obvious that (6.2) is valid for any (x, y) ∈ B with x ∈ SN for suitable k1 , k2 > 0. Thus, it suffices to verify (6.2) for any (x, y) ∈ B with x ∈ S \ SN . In what follows, we fix such (x, y) ∈ B arbitrarily. Then, in view of (6.1), we see that ∑ w(x) = p(x, z)w(z) ≥ p(x, y)w(y) ≥ ε0 w(y), z∈S

where ε0 > 0 is the constant in (A1). This gives the first inequality in (6.2) with k1 = ε0 . To verify the second inequality, choose some finite y0 , y1 , . . . , yn ∈ S \ supp r with n ≤ M such that y0 = y, yn = x, and (yi−1 , yi ) ∈ B for all 1 ≤ i ≤ n. Then, by using the previous estimate repeatedly, we easily see that the second inequality holds with k2 = ε−M 0 . Hence, modifying k1 , k2 suitably, we conclude that (6.2) holds true for any (x, y) ∈ B. We now set W0 (x) := (1/κ2 ) log w(x), where κ2 is the constant in (A3). Then W0 is nonpositive in S, W0 = 0 in supp r, and W0 (xn ) → −∞ as d(xn ) → ∞. It is also easy to see from (6.2) that there exists a C > 0 such that |W0 (x) − W0 (y)| ≤ C for all (x, y) ∈ B. Moreover, for any x ∈ S \ supp r, we have 1 sup{(q(κ2 W0 ))(x) − I(x, q(x, · ))} κ2 q∈Q q∈Q 1 1 1 ≤ log(peκ2 W0 )(x) = log(pw)(x) = log w(x) = W0 (x). κ2 κ2 κ2

H[W0 ](x) = sup{(qW0 )(x) − c(x, q)} ≤

Hence, we have completed the proof. Proposition 6.3. Assume that the p-chain is transient. Let W0 be the function given in Proposition 6.2. Then there exists a solution W of (4.1) with β = βc such that supS (W − W0 ) < ∞ and supS W = maxsupp r W . Proof. Let {βn } be a decreasing sequence such that βn → βc as n → ∞. We may assume without loss of generality that βn < βc + 1 for all n ≥ 1. Let Wn be the solution of (4.1) for β = βn . By choosing a subsequence of {βn } if necessary, and taking into account Proposition 3.6, we may also assume that Wn converges to a solution W of (4.1) with β = βc as n → ∞. We now set V := eκ1 (W0 −Wn ) and q¯n = argmax H[Wn ], where κ1 > 0 is the constant in (A3). Then, in view of Lemma 3.4 and Proposition 6.2, we observe that ∗ q¯n V ≤ e−κ1 λ (βn ) V ≤ V in S \ supp r. This implies that inf S\supp r V ≥ minsupp r V . Otherwise, Theorem 3.2 (i) yields that the q¯n -chain is transient, which is inconsistent 23

with Proposition 4.6. Hence, inf S\supp r V ≥ minsupp r V . Using Proposition 3.6 and the fact that W0 = 0 in supp r, we have inf (W0 − Wn ) ≥ min (W0 − Wn ) ≥ − max |Wn | ≥ −K

S\supp r

supp r

supp r

(6.3)

for some K > 0 independent of n. Hence, Wn ≤ W0 + K in S. Sending n → ∞, we obtain W ≤ W0 + K in S. We can also see from (6.3) and Proposition 6.2 that supS\supp r Wn ≤ maxsupp r Wn . Letting n → ∞, we have supS\supp r W ≤ maxsupp r W . Hence, we have completed the proof. We are now ready to prove Theorem 2.3. Proof of Theorem 2.3. We first show that βc is positive if and only if the p-chain is transient. Suppose that the p-chain is transient. We argue by contradiction assuming that βc = 0. Let W be the solution of (4.1) with β = βc constructed in Proposition 6.3, and set q¯ = argmax H[W ]. Then, W = H[W ] + βc r ≤ q¯W in S. Since V := −W attains its minimum at some point in supp r, we see in view of Lemma 3.1 (ii) that W is constant in S. But, this is a contradiction since W (xn ) → −∞ as n → ∞ along the sequence {xn } ⊂ S taken in Proposition 6.2. Hence, βc > 0. Suppose next that the p-chain is recurrent and show βc = 0. Assume contrarily that βc > 0. Then, since W ≡ 0 is a solution of (4.1) with β = 0 < βc and p = argmax H[0], we see in view of Proposition 6.1 that the p-chain is transient. But, this is a contradiction. Hence, βc = 0. Now, let W be any solution of (4.1), and set q¯ = argmax H[W ]. Then, in view of Propositions 6.1 and 4.6, we see that the q¯-chain is transient for β < βc and positive recurrent for β > βc . Hence, we have completed the proof of Theorem 2.3. The recurrence and transience of the q¯-chain for β = βc is more delicate. We give here a sufficient condition so that the q¯-chain is recurrent. Notice that the q¯-chain is recurrent if the p-chain is recurrent. Indeed, suppose that the p-chain is recurrent. Then, in view of Theorem 2.3, we see that βc = 0. This implies that W ≡ 0 is the unique solution of (4.1) with β = βc = 0. Uniqueness can be verified as follows. Let ′ W ′ be another solution of (4.1) with β = βc = 0, and set V := eκ1 (W −0) . Then, noting the recurrence of p and applying Lemmas 3.1 (ii) and 3.4, we conclude that W ′ = 0 in S. In particular, we have q¯ = argmax H[0] = p. This clearly implies that the q¯-chain is recurrent. Thus, it suffices to consider the case where the p-chain is transient. Proposition 6.4. Let the p-chain be transient. Assume that there exists a finite set A ⊂ S such that the function w(x) := Ppx (τA < ∞), where τA := inf{n ≥ 0 | Xn ∈ A}, converges to zero as d(x) → ∞. Then there exists a unique solution W of (4.1) with β = βc . Moreover, the Markov chain associated with q¯ = argmax H[W ] is recurrent. 24

Proof. Set W0 := (1/κ2 ) log w, where κ2 is the constant in (A3). Note that W0 (x) → −∞ as d(x) → ∞. Then, by following the argument in Propositions 6.2 and 6.3, one can see that there exists a solution W of (4.1) with β = βc such that supS (W −W0 ) < ∞. We now set q¯ := argmax H[W ] and prove the recurrence of the q¯-chain. Since W (x) → −∞ as d(x) → ∞ and W = H[W ] = q¯W − c( · , q¯) ≤ q¯W in S \ supp r, we can apply Theorem 3.2 (ii) with V := −W to conclude that the q¯-chain is recurrent. We next show that the solution of (4.1) with β = βc is unique. Let W ′ be another ′ solution of (4.1) with β = βc . Set V := eκ1 (W −W ) , where κ1 is the constant in (A3). Then, q¯V ≤ V in S by Lemma 3.4. Since the q¯-chain is recurrent, we see that V attains a minimum at some point in S. Applying Lemma 3.1 (ii), we conclude that V is constant in S. Hence, W ′ = W in S. Remark 6.5. The hypothesis of Proposition 6.4 is satisfied if S = Zd and p generates a strongly aperiodic transient random walk on Zd with d ≥ 3. See for instance [18, Proposition 25.3]. We end this section with some estimates on βc . In order to describe the result ∑ precisely, recall that r(x) = li=1 αi 1{xi } (x) in S for some finite x1 , . . . , xl ∈ S and α1 , . . . , αl > 0. Set Ti := inf{n ≥ 1 | Xn = xi } for 1 ≤ i ≤ l, and T := min1≤i≤l Ti , where inf ∅ = +∞. Define the (l × l)-matrix Uβ = (uij ) by uij := eαi β Ppxi (T = Tj < ∞),

1 ≤ i, j ≤ l.

Since Uβ is nonnegative and irreducible for any β ≥ 0, one can define the PerronFrobenius eigenvalue, i.e., the largest real eigenvalue of Uβ , which we denote by σPF (Uβ ). Then, the following theorem holds. Theorem 6.6. Let βc be the constant given in Theorem 2.1, and set β0 := sup{β ∈ R | σPF (Uβ ) < 1}. Then, 1 1 max{0, β0 } ≤ βc ≤ max{0, β0 }, κ2 κ1 where κ1 , κ2 > 0 are the constants in (A3). In order to prove Theorem 6.6, we start with the simplest case where a(x) ≡ 1 in (A3). Lemma 6.7. Suppose that a(x) ≡ 1 in (A3). Let W be a solution of (4.1), and set q¯ = argmax H[W ]. Then, Pqx¯i (T

= Tj < ∞) = e

W (xj )−W (xi ) αi β

e

∞ ∑ n=1

for any 1 ≤ i, j ≤ l. 25

e−nλ

∗ (β)

Ppxi (T = Tj = n)

Proof. In view of equation (4.1) and Lemma 3.3, we observe that, for any x, y ∈ S, q¯(x, y) =

p(x, y)eW (y) ∗ = eW (y)−W (x) e−λ (β) eβr(x) p(x, y). W (pe )(x)

(6.4)

Taking into account this formula, we have Pqx¯i (T

= Tj = n) = =



n ∏

y1 ,...,yn−1 ̸∈supp r k=1 n ∑ ∏

q¯(yk−1 , yk ) eW (yk )−W (yk−1 ) e−λ

∗ (β)

eβr(yk−1 ) p(yk−1 , yk )

y1 ,...,yn−1 ̸∈supp r k=1

= eW (xj )−W (xi ) e−nλ

∗ (β)

eβr(xi ) Ppxi (T = Tj = n),

where we have set y0 := xi and yn := xj . Taking the sum with respect to n, we obtain the desired equality. Proposition 6.8. Assume that a(x) ≡ 1 in (A3). Let βc be the constant given in Theorem 2.1. Then, βc = max{0, β0 }, where β0 := sup{β ∈ R | σPF (Uβ ) < 1}. ∑ Proof. If the p-chain is recurrent, then lj=1 uij ≥ 1 for 1 ≤ i ≤ l, which implies that σPF (Uβ ) ≥ 1 for any β ≥ 0. In particular, β0 ≤ 0, so that max{0, β0 } = 0. On the other hand, βc = 0 in view of Theorem 2.3. Hence, the claim is valid. It thus remains to consider the case where the p-chain is transient. Let W be a solution of (4.1), and set q¯ = argmax H[W ]. For each k ≥ 0, we define gk = (gk (i))1≤i≤l ∈ Rl by gk (i) := Pqx¯i (N ≥ k), where N denotes the total number of visits to the set supp r, that is, N :=

∞ ∑ l ∑

1{xj } (Xn ).

n=1 j=1

Set u¯ij := Pqx¯i (T = Tj < ∞) and define the (l × l)-matrix by U¯ := (¯ uij ). Then, in view of the strong Markov property, we see that, for any k ≥ 1, gk (i) =

Pqx¯i (N

≥ k) =

l ∑

Pqx¯i (N ≥ k | T = Tj < ∞)Pqx¯i (T = Tj < ∞)

j=1

=

l ∑

Pqx¯j (N ≥ k − 1)Pqx¯i (T = Tj < ∞) =

j=1

l ∑

u¯ij gk−1 (j).

j=1

In particular, gk = U¯ gk−1 . Using this relation repeatedly, we obtain gk = U¯ k g0 = U¯ k 1, where 1 := t (1, . . . , 1) ∈ Rl .

26

We now suppose that β < βc . Since λ∗ (β) = 0, we see in view of Lemma 6.7 that (U¯ k )ij = eW (xj )−W (xi ) (Uβk )ij for any k ≥ 1 and 1 ≤ i, j ≤ l. Setting C := max1≤i,j≤l (W (xj ) − W (xi )) and noting that the q¯-chain is transient, we obtain e−C

∞ ∑

(Uβk 1)(i) ≤

k=1

∞ ∑

gk (i) =

k=1

∞ ∑

Pqx¯i (N ≥ k) = Eqx¯i [N ] < ∞.

k=1

Hence, σPF (Uβ ) < 1. Since β < βc is arbitrary, we have βc ≤ β0 . We next claim that βc = β0 . We argue by contradiction assuming that βc < β0 . Fix any βc < β < β0 . Then, in view of Lemma 6.7, we have u¯ij < eW (xj )−W (xi ) uij , and therefore (U¯ k )ij < eW (xj )−W (xi ) (Uβk )ij for any k ≥ 1 and 1 ≤ i, j ≤ l. Thus, we obtain Eqx¯i [N ]

=

∞ ∑ k=1

gk (i) =

∞ ∑

(U¯ k 1)(i) < eC

k=1

∞ ∑

(Uβk 1)(i).

k=1

But, this is a contradiction since the right-hand is finite, whereas the left-hand side is infinite by the choice of β. Hence, βc = β0 , and we have completed the proof. Remark 6.9. From Proposition 6.8 and Theorem 2.2, together with Remark 2.4, one can obtain the critical value for F (β) defined by (1.4). More precisely, let βc := max{0, β0 }, where β0 := sup{β ∈ R | σPF (Uβ ) < 1}. Then, F (β) = 0 for all 0 ≤ β ≤ βc and F (β) > 0 for all β > βc . Assume moreover that the support of r = r(x) consists of a single point, say, r(x) = 1{0} (x). Then βc = − log Pp0 (T < ∞), where T := inf{n ≥ 1 | Xn = 0}. On the other hand, let Fˆ (β) be its continuous-time counterpart defined by (1.6). Then, it is well known (e.g., [6, 11]) that the critical value βc for Fˆ (β) can be represented as (∫ )−1 d dθ 1∑ d βc = (2π) , g(θ) := cos θj , θ = (θ1 , . . . , θd ), d j=1 Td 1 − g(θ) where Td := [−π, π)d stands for the d-dimensional torus. This value is equal to the probability that the continuous-time random walk on Zd starting from the origin and generated by (1/2)∆, where ∆ denotes the discrete Laplacian, does not hit the origin after time t > 0. Notice here that the critical value βc for Fˆ (β) does not coincide with that of F (β), in general, since it depends on whether X = (Xn ) is continuous-time or discrete-time. We are now in position to prove Theorem 6.6. Proof of Theorem 6.6. It suffices to consider the case where the p-chain is transient. Let λ∗i (β), i = 1, 2, denote the constant defined by (2.6) with a(x) ≡ κi in (A3). We set βi := max{β ≥ 0 | λ∗i (β) = 0} for i = 1, 2. Then, it is easy to see that 27

λ∗1 (β) ≤ λ∗ (β) ≤ λ∗2 (β) for all β ≥ 0, which implies that β2 ≤ βc ≤ β1 . On the other hand, let λ∗0 (β) be the constant defined by (2.6) with a(x) ≡ 1 in (A3). Then, we have ∗ λ∗i (β) = κ−1 i λ0 (κi β) for all β ≥ 0 and i = 1, 2. In particular, βi = max{β ≥ 0 | λ∗0 (κi β) = 0} =

1 max{β ≥ 0 | λ∗0 (β) = 0}. κi

This together with Proposition 6.8 imply that βi = κ−1 i β0 for i = 1, 2. Hence, we have completed the proof. Remark 6.10. If a(x) ≡ 1 in (A3), then W is a solution of (4.1) if and only if ϕ(x) = eW (x) is a positive solution to the linear equation P ϕ = ϕ in S, where P = P (x, y) := e−λ written as q¯(x, y) =

∗ (β)

ϕ(x0 ) = 1,

(6.5)

eβr(x) p(x, y). Furthermore, q¯ = argmax H[W ] can be

p(x, y)eW (y) P (x, y)ϕ(y) = , W (pe )(x) ϕ(x)

x, y ∈ S,

which coincides with Doob’s h-transform of P by ϕ. Acknowledgments. The author would like to thank the anonymous referee for his valuable comments, as well as for indicating the reference [13], that helped the author improve the presentation of this paper. This research was supported in part by JSPS KAKENHI Grant Numbers 24740089, 15K04935.

Appendix In this appendix we collect some fundamental facts on the solvability of optimality equations in finite state spaces. Let A be a finite set, and let p = p(x, y) be an irreducible stochastic matrix on A. We set BA := {(x, y) ∈ A × A | p(x, y) > 0} and regard (A, BA ) as a directed graph with graph distance dA (x, y) := inf{n ≥ 0 | pn (x, y) > 0}. Since A is finite, there exists an M > 0 such that dA (y, x) ≤ M for any (x, y) ∈ BA . Let QA be the set of stochastic matrices q = q(x, y) on A such that q(x, y) = 0 for all (x, y) ̸∈ BA . For a given bounded function L : A × QA → R, we set HA [f ](x) := sup {(qf )(x) + L(x, q)},

x ∈ A,

q∈QA

where f : A → R. Let us denote by argmax HA [f ] the totality of q¯ ∈ QA such that HA [f ](x) = (¯ q f )(x) − L(x, q¯) for all x ∈ A. In what follows, we assume that argmax HA [f ] ̸= ∅ for any f : A → R. 28

Now, for each α ∈ (0, 1], we consider the following equation: V = HA [αV ] in A.

(A.1)

As usual, we say that V is a supersolution (resp. subsolution) of (A.1) if V ≥ HA [αV ] in A (resp. V ≤ HA [αV ] in A), and that V is a solution of (A.1) if it is both sub- and supersolutions. We first observe the following comparison principles. Theorem A.1. (i) Let V1 and V2 be, respectively, sub- and supersolutions of (A.1) for some α ∈ (0, 1). Then V1 ≤ V2 in A. (ii) Let V1 and V2 be, respectively, sub- and supersolutions of V = HA [V ] in a subset D ⊂ A. Suppose that there exists an irreducible q¯ ∈ argmax HA [V1 ]. Then, V1 ≤ V2 in A \ D implies V1 ≤ V2 in A. Proof. We first prove (i). Set W := V2 − V1 and choose any q¯ ∈ argmax HA [αV1 ]. Then α(¯ q W ) ≤ W in A. We prove that minA W ≥ 0 arguing by contradiction. Assume that minA W = W (x) < 0 for some x ∈ A. Then ∑ 0≤α q¯(x, y)(W (y) − W (x)) ≤ (1 − α)W (x) < 0, y∈A

which is a contradiction. Hence, V1 ≤ V2 in A. We next prove (ii). We may assume without loss of generality that D ̸= ∅. Set W := V2 − V1 and assume that minD W = W (x) < 0 for some x ∈ D. We choose a q¯ ∈ argmax HA [V1 ] which is irreducible in A. Then, since q¯W ≤ W in A, we have ∑ 0≤ q¯(x, y)(W (y) − W (x)) ≤ 0. y∈A

This implies that W (y) = W (x) for any y ∈ A with q¯(x, y) > 0. Repeating this argument together with the irreducibility of q¯, one can find a (y, z) ∈ BA such that y ∈ D, z ∈ A \ D, W (y) = minD W < 0, and q¯(y, z) > 0. In particular, we have 0 < q¯(y, z)(W (z) − W (y)) ≤ (¯ q W )(y) − W (y) ≤ 0. But, this is a contradiction. Hence, W ≥ 0 in D, and the proof is complete. Theorem A.2. Assume that α ∈ (0, 1). Then there exists a unique solution Vα of (A.1). Furthermore, for any x, y ∈ A, (1 − α)|Vα (x)| ≤ sup |L|,

α|Vα (y) − Vα (x)| ≤ 2M dA (x, y) sup |L|.

A×QA

(A.2)

A×QA

Proof. The existence part can be shown by the standard value iteration argument. Uniqueness follows from Theorem A.1 (i). In order to obtain the estimates (A.2), we 29

set K0 := supA×QA |L|, V1 := −K0 /(1 − α), and V2 := K0 /(1 − α). Since V1 and V2 are, respectively, sub- and supersolutions of (A.1), we have V1 ≤ Vα ≤ V2 in A by Theorem A.1 (i). Hence, the first estimate in (A.2) is valid. To prove the second estimate, fix any (x, y) ∈ BA . Then Vα (x) ≥ αVα (y) + L(x, δy ), which implies that α(Vα (y) − Vα (x)) ≤ (1 − α)Vα (x) − L(x, δy ) ≤ 2 sup |L|. A×QA

Since dA (y, x) ≤ M , we easily see that α(Vα (x)−Vα (y)) ≤ 2M supA×QA |L|. Hence, the second estimate is valid if (x, y) ∈ BA . For the general case, we fix any x, y ∈ A. Then there exists a finite sequence y0 , y1 , . . . , yn ∈ A with n := dA (x, y) such that y0 = x and yn = y. Applying the previous estimate repeatedly, we obtain α|Vα (y) − Vα (x)| ≤

n ∑

α|Vα (yj ) − Vα (yj−1 )| ≤ 2M dA (x, y) sup |L|. A×QA

j=1

Hence, we have completed the proof. We now fix a reference point x0 ∈ A, and consider the equation λ + W = HA [W ] in A,

W (x0 ) = 0,

(A.3)

where the unknown is the pair of a real constant λ and a function W : A → R. Theorem A.3. There exists a solution (λ, W ) of (A.3) such that |λ| ≤ sup |L|,

|W (y) − W (x)| ≤ 2M dA (x, y) sup |L|,

x, y ∈ A.

A×QA

A×QA

Moreover, if there exists an irreducible q¯ ∈ argmax HA [f ] for any f : A → R, then the solution of (A.3) is unique. Proof. We first prove the existence. Let Vα be the solution of (A.1), and set λα := (1 − α)Vα (x0 ) and Wα := Vα − Vα (x0 ). Then (λα , Wα ) enjoys λα + Wα = HA [αWα ] in A,

Wα (x0 ) = 0.

(A.4)

Since |λα | ≤ C and |Wα (x)| ≤ 2M dA (x0 , x)C with C := supA×QA |L|, one can find an increasing sequence {αn } converging to 1 as n → ∞ such that λn := λαn → λ and Wn (x) := Wαn (x) → W (x) as n → ∞ for some λ ∈ R and W : A → R. By virtue of Theorem A.2, we observe that (λ, W ) enjoys the desired estimates. It is also easy to see that (λ, W ) solves (A.3) by letting n → ∞ in (A.4) with α = αn . Hence, we have proved the existence of solution to (A.3) satisfying the desired estimates. We next prove the uniqueness. Let (λ1 , W1 ) and (λ2 , W2 ) be sub- and supersolutions of (A.3), respectively. We first show that λ1 ≤ λ2 , which yields the uniqueness of λ 30

by changing the role of (λ1 , W1 ) and (λ2 , W2 ). Set W := W2 − W1 and choose a q¯ ∈ argmax HA [W1 ] which is irreducible. Then we see that λ2 − λ1 + W (x) ≥ (¯ q W )(x),

x ∈ A.

(A.5)

Since A is finite and q¯ is irreducible, there exists a unique invariant probability measure π on A. Multiplying both sides of (A.5) by π, we have λ2 − λ1 + πW ≥ π q¯W = πW

in A.

Hence, λ2 ≥ λ1 . Now, let λ be the unique constant for which (A.3) admits a solution. Then, from (A.5), we have q¯W ≤ W in A. Applying the same argument as in the proof of Theorem 3.1 (ii), we see that W is constant in A. Since W (x0 ) = 0, we have W1 = W2 in A. Hence, we have completed the proof.

References [1] A. Arapostathis, V.S. Borkar, E. Fern´andez-Gaucherand, M.K. Ghosh, S.I. Marcus, Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim., 31 (1993), pp. 282–344. [2] R. Bellman, Dynamic Programming, Princeton University Press, Princeton, NJ, 1957. [3] P. B´remaud, Markov chains, Gibbs fields, Monte Carlo simulation, and queues, Texts in Applied Mathematics 31, Springer-Verlag, New York, 1999. [4] R. Cavazos-Cadena, D. Hern´andez-Hern´andez, A system of Poisson equations for a nonconstant Varadhan functional on a finite state space, Appl. Math. Optim., 53 (2006), pp. 101–119. [5] R. Cavazos-Cadena, D. Hern´andez-Hern´andez, Necessary and sufficient conditions for a solution to the risk-sensitive Poisson equation on a finite state space, Systems Control Lett., 58 (2009), pp. 254–258. [6] M. Cranston, S. Molchanov, On phase transitions and limit theorems for homopolymers, Probability and mathematical physics, CRM Proc. Lecture Notes, 42, Amer. Math. Soc., Providence, RI, 2007, pp. 97–112. [7] M. Cranston, L. Koralov, S. Molchanov, B. Vainberg, Continuous model for homopolymers, J. Funct. Anal., 256 (2009), pp. 2656–2696.

31

[8] A. Dembo, O. Zeitouni, Large deviations techniques and applications, Second ed., Applications of Mathematics 38, Springer-Verlag, New York, 1998. [9] W. H. Fleming and D. Hern´andez-Hern´andez, Risk-sensitive control of finite state machines on an infinite horizon I, SIAM J. Control Optim., 35 (1997), pp. 1790– 1810. [10] C. Graham, Markov chains. Analytic and Monte Carlo computations, Wiley Series in Probability and Statistics, John Wiley & Sons, Ltd., Paris, 2014. [11] F. Hiroshima, I. Sasaki, T. Shirai, A. Suzuki, Note on the spectrum of discrete Schr¨odinger operators, J. Math-for-Ind., 4B (2012), pp. 105–108. [12] F. den Hollander, Large deviations, Fields Institute Monographs 14, AMS, Providence, RI, 2000. [13] A. Hordijk, Dynamic Programming and Markov Potential Theory, Math. Centre Tract, No. 51, Mathematisch Centrum, Amsterdam, 1974. [14] N. Ichihara, Criticality of viscous Hamilton-Jacobi equations and stochastic ergodic control, J. Math. Pures Appl., 100 (2013), pp. 368–390. [15] N. Ichihara, The generalized principal eigenvalue for Hamilton-Jacobi-Bellman equations of ergodic type, Ann. Inst. H. Poincar´e Anal. Non Lin´eaire, 32 (2015), pp. 623–650. [16] M. L. Puterman, Markov Decision Processes, John Wiley and Sons, New York, 1994. [17] S. M. Ross, Introduction to Stochastic Dynamic Programming, Academic Press, New York, 1983. [18] F. Spitzer, Principles of random walk, Second ed., Graduate Texts in Mathematics 34, Springer-Verlag, New York-Heidelberg, 1976.

32

Phase transitions for controlled Markov chains on ...

Schrödinger operators, J. Math-for-Ind., 4B (2012), pp. 105–108. [12] F. den Hollander, Large deviations, Fields Institute Monographs 14, AMS, Prov- idence, RI, 2000. [13] A. Hordijk, Dynamic Programming and Markov Potential Theory, Math. Centre. Tract, No. 51, Mathematisch Centrum, Amsterdam, 1974. [14] N. Ichihara ...

166KB Sizes 0 Downloads 273 Views

Recommend Documents

Phase transitions for unprovability
Phase Transitions. PA and fragments. Fast growing hierarchy ordinals below ε0. Important properties of these ordinals: They are well ordered. Any ordinal α can be written in the Cantor Normal Form: α = ωα1 · m1 + ··· + ωαn · mn, where α>

Clustering Finite Discrete Markov Chains
computationally efficient hybrid MCMC-constrained ... data. However, this distribution is rather complex and all of the usual distribution summary values (mean,.

Lumping Markov Chains with Silent Steps
a method for the elimination of silent (τ) steps in Markovian process ...... [8] M. Bravetti, “Real time and stochastic time,” in Formal Methods for the. Design of ...

Safe Markov Chains for ON/OFF Density Control with ...
tonomous systems operating in complex dynamic environ- ments like ... Note that, the sets of actions for the ON and OFF modes are the same in this example. II. RELATED RESEARCH. The proposed Markov model is applicable to both decision- making for ...

Simons, Phase Transitions and Collective Phenomena.pdf ...
Simons, Phase Transitions and Collective Phenomena.pdf. Simons, Phase Transitions and Collective Phenomena.pdf. Open. Extract. Open with. Sign In.

Local bias-induced phase transitions
Multiple examples in energy technologies include electrochemical reactions in fuel ... ITRS.net) has stimulated the search for alternative data- storage and ... coupling in ferroelectric RAM1,2 and data storage3, to electrically triggered phase ...

Solution Phase Interactions Controlled Ordered ...
as DNA, chitosan, phospholipids, and BSA by using seed-mediated ... arranged in a typical pearl-necklace type arrangement except in the presence of BSA.

A Martingale Decomposition of Discrete Markov Chains
Apr 1, 2015 - Email: [email protected] ... may be labelled the efficient price and market microstructure noise, respectively. ... could be used to compare the long-run properties of the approximating Markov process with those of.

Lumping Markov Chains with Silent Steps
U = 1 0 0 0 0. 0 1. 2. 1. 2. 0 0. 0 0 0 1. 2. 1. 2. ˆ. Q = −(λ+µ) λ+µ 0. 0. −λ λ ρ. 0. −ρ. DSSG, Saarbrücken - Feb. 1; PAM, CWI Amsterdam - Feb. 22; PROSE - TU/e, ...

Lumping Markov Chains with Silent Steps - Technische Universiteit ...
VE. 0. VT E VT. ) . Con- dition 2 written in matrix form is now VEUE ( QE QET ) V = .... [9] H. Hermanns, Interactive Markov chains: The Quest for Quantified.

Using hidden Markov chains and empirical Bayes ... - Springer Link
Page 1 ... Consider a lattice of locations in one dimension at which data are observed. ... distribution of the data and we use the EM-algorithm to do this. From the ...

Mixing Time of Markov Chains, Dynamical Systems and ...
has a fixed point which is a global attractor, then the mixing is fast. The limit ..... coefficients homogeneous of degree d in its variables {xij}. ...... the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS '07, pages 205–214,.

evolutionary markov chains, potential games and ...
... of Computer Science. Georgia Institute of Technology ... him a lot, he is one of the greatest researchers in theoretical computer science. His book on ..... live in high dimensional spaces, i.e., they exhibit many degrees of freedom. Under- ... c

PHY 214B: Phase Transitions & Critical Phenomena ...
(c) For the case t1 = t2 and u1 = u2 = u12, this model reduces to a well-known model; what is it? (It is often useful to identify such special points or lines that have extra symmetry.) (d) Assuming that the fields coupling to each order parameter va

Phase Transitions in Coevolving Helping Networks
in evolving weighted networks. PLoS ONE 6, e22687 (2011) ... t − 1 t − 2. Target! (p. GA. ) Ρi!j ∝ wij. Target & Call. OR protocol. AND protocol. Call! busy! always ...

Interplay between Ferroelastic and MetalInsulator Phase Transitions in ...
May 10, 2010 - mophase and heterophase domain systems in VO2 single- crystalline NPls. VO2 NPls and nanowires were ... As verified by optical microscopy and atomic force micros- copy (AFM) measurements, the NPl ... system of the parent phase, as is c

Phase Transitions and Critical Fluctuations in the Visual ...
basketball players moving downcourt or the degree of coor- dination between ... National Science Foundation (NSF) Grant BNS-8811510 awarded to .... mode) and one at 180° relative phase angle (alternate mode; ..... a 33-year-old female professor (CC)

PHY 214B: Phase Transitions & Critical Phenomena, UCI Spring ...
PHY 214B: Phase Transitions & Critical Phenomena, UCI Spring Quarter 2016, HW #3. S.A. Parameswaran. Due in my FRH 4129 mailbox (listed under Ashok-Parameswaran) by 4 pm Thursday, May 5, 2016. 1. Kardar, Problem 4.3. 2. Kardar, Problem 4.4. 3. The Ko