Int J Game Theory (2010) 39:53–69 DOI 10.1007/s00182-009-0193-2
Explicit formulas for repeated games with absorbing states Rida Laraki
Accepted: 27 October 2009 / Published online: 1 December 2009 © Springer-Verlag 2009
Abstract Explicit formulas are given for the asymptotic value limλ→0 v(λ) and the asymptotic minmax lim w(λ) of finite λ-discounted absorbing games together with new simple proofs for the existence of the limits as λ goes to zero. Similar characterizations for stationary Nash equilibrium payoffs are obtained. The results may be extended to absorbing games with compact metric action sets and jointly-continuous payoff functions. Keywords
Absorbing discounted games · Asymptotic analysis · Explicit formulas
1 Introduction “When people interact, they have usually interacted in the past, and expect to do so again in the future. It is this ongoing element that is studied by the theory of repeated games”.1
I am honored to publish this paper in the special issue in the memory of a great game theorist: Michael Maschler. My PhD dissertation was based in a large part on his work with Robert Aumann on repeated games with incomplete information. R. Laraki—Part time associated with Équipe Combinatoire et Optimisation, Université Paris 6 1 Aumann and Maschler (1995, p. Xi).
R. Laraki (B) CNRS, Economics Department, Ecole Polytechnique, Paris, France e-mail:
[email protected] R. Laraki Équipe Combinatoire et Optimisation, Université Paris 6, Paris, France
123
54
R. Laraki
Aumann and Maschler (1995) introduced in 1966–1968 repeated games with incomplete information. Their approach was revolutionary in many aspects and generates a huge and deep literature. One of the main contributions was the conceptual distinction of the several ways of evaluating the stream of payoffs in long interactions. The finitely repeated game n has “a definite duration, n, on which the players can base their strategies. Indeed, optimal strategies in n may be quite different for different n”. On the other hand, in the infinitely repeated game ∞ “the strategies are by definition independent of n. Thus, ∞ reflects properties of the game n that hold “uniformly” in the duration n […] By using an optimal strategy in ∞ (if there is one), a player guarantees in one fell swoop that in sufficiently long finite truncation, the outcome will not be appreciably worse in each n ”.2 If vn denotes the value of the finitely repeated game n , limn→∞ vn “tells the analyst something about repetitions that are “long”, without his having to know how long. But lim vn is only meaningful as the limit of values of games whose duration is precisely known to the players […] To analyze a situation in which the players themselves know only that the game is “long”[…], ∞ is the appropriate model”.3 Let v∞ denote the uniform value of the repeated game (i.e. the value of ∞ if it exists). In addition to v∞ and lim vn , “there are two specific models of “long” repetitions that warrant discussion. The first is the limit of the values vλ of the discounted game λ as the discounted rate goes to zero […] this is conceptually closer to lim vn than to v∞ […] [and] most of the above discussion applies when λ is substituted for n […] On the other hand, discounted games λ are like ∞ —and unlike n —in that they have no fixed, commonly known last stage […] [and] it admits [optimal] strategies with some kind of stationarity property”.4 For repeated games with incomplete information on one side, Aumann and Maschler proved that v∞ exists and so the asymptotic values lim vn and lim vλ exist and are equal to v∞ . More importantly, they provide an explicit formula for the common value (their famous Cav(u) theorem) and show that v∞ does not always exist for repeated games with incomplete information on both sides. Mertens and Zamir (1971) proved the existence of the asymptotic values lim vn = lim vλ in repeated games with incomplete information on both sides and provided an elegant system of functional equations that characterizes the common limit. Aumann and Maschler extended the existence of v∞ and its characterization to repeated games with incomplete information on one side, imperfect monitoring and state dependent signaling (that is, the players does not fully observe the past moves of the opponents but only a signal that may depend, deterministically or stochastically, on the true state and the last moves). To study repeated games with symmetric incomplete information on both sides, imperfect monitoring and state dependent public signaling, Kohlberg and Zamir (1974) reduced the existence of the uniform value in the deterministic case to the study of an absorbing game ∗ . Combined with the result of Kohlberg (1974), this implies the existence of the uniform value.
2 Aumann and Maschler (1995, p. 131). 3 Aumann and Maschler (1995, p. 132). 4 Aumann and Maschler (1995, p. 139). They used the notations δ and v δ , for and v respectively. λ λ
123
Explicit formulas for repeated games with absorbing states
55
Repeated game with absorbing states, in short absorbing games, are stochastic games in which only one state is non-absorbing. Stochastic games are repeated games in which a state variable follows a Markov chain controlled by the actions of the players. Shapley (1953) introduced the two player zero-sum model with finitely many states and actions (the finite model). He proved the existence of the value vλ in the λdiscounted game5 by introducing a dynamic programming principle (called the Shapley operator). The idea of the Kohlberg and Zamir (1974) reduction is simple: each time an informative pair is played, the identity of the true state is revealed (i.e. the game is absorbed). “At about this time, it was realized that the game ∗ are particular instances of “stochastic games” in the sense of Shapley […] Motivated by the above application to repeated games, Bewley and Kohlberg managed to prove that lim vn exists for all stochastic games […] But thought they tried hard, and obtained important partial results, they were unable to prove that v∞ exists for all stochastic games. This difficult problem was finally solved (positively) by Mertens and Neyman”.6 The Kohlberg and Zamir reduction has been extended by Neyman and Sorin (1998) to establish the existence of uniform equilibria in multi-player repeated games with symmetric incomplete information and non-deterministic public signaling. A similar reduction has been used by Abreu et al. (1991) to establish their famous formula that characterizes the Pareto-optimal trigger equilibrium payoff in discounted repeated games with imperfect public monitoring as the discount factor goes to zero: a principal plan is played; If a bad public signal is observed, cooperation stops (i.e. the game is absorbed). The first and famous absorbing game example was introduced in Gillette (1957) (the big match). Blackwell and Ferguson (1968) proved that the big match admits a uniform value under the full monitoring assumption. Without that observation, v∞ may not exist as shown in Blackwell and Ferguson (1968) and Coulomb (2001). Using an operator approach, Kohlberg (1974) proved the existence of the uniform (and hence asymptotic values) in any finite absorbing game with full monitoring. The operator approach uses the additional information obtained from the derivative of the Shapley operator at λ = 0 to deduce the existence of lim vλ and its characterization via variational inequalities. Rosenberg and Sorin (2001) extended the Kohlberg operator approach for the asymptotic values to a large class of stochastic games that includes (1) compact and separately-continuous absorbing games7 and (2) repeated games with incomplete information on both sides (Aumann and Maschler 1995; Mertens and Zamir 1971). Mertens et al. (2009) combined the technics in Mertens and Neyman (1981) and Rosenberg and Sorin (2001) to show the existence of the uniform value in compact and separately-continuous absorbing games with full monitoring. An algebraic approach allows Bewley and Kohlberg (1976a, b) to prove the existence of the asymptotic values lim vλ = lim vn in every finite stochastic game. The 5 All the results on stochastic games cited in this introduction assume that the state at each stage is publicly known to the players and that the description of the game is common knowledge among the players. 6 Aumann and Maschler (1995, p. 217). 7 Meaning that action sets are compact and metric and payoff and transition functions are separately-
continuous.
123
56
R. Laraki
breakthrough come when Mertens and Neyman (1981) proved the existence of the uniform value v∞ in every finite stochastic game with full monitoring. To study long interactions, fixed-point theorems are not in general sufficient and more sophisticated methods need to be devised (Aumann and Maschler 1995; Bewley and Kohlberg 1976a; Coulomb 2001; Laraki 2001a, b; Mertens and Neyman 1981; Mertens and Zamir 1971; Rosenberg and Sorin 2001). Proving the existence of the asymptotic values or of the uniform value is an important theoretical contribution, but finding an explicit formula between the data of the game and its value (as did Aumann and Maschler (1995) and Mertens and Zamir (1971)) allows numerical computations and enables the study of how changes in the underlying data affect the value of the game. Unfortunately, very few repeated games have an explicit formula for the asymptotic or uniform values. In the large class of repeated games where lim vλ = lim vn = v∞ , it is sufficient to establish the asymptotic formula in the discounted model (because of its stationarity). Inspired by the tools developed in the theory of zero-sum differential games with fixed duration, Laraki (2001a, b) used a variational approach to characterize the asymptotic value of discounted stochastic games in which each player controls a martingale (including Aumann and Maschler (1995) and Mertens and Zamir (1971)). Following the line of research of Vrieze and Thuijsman (1989) and Flech et al. (1996), a variational approach gives, for compact and jointly-continuous absorbing games, a new simple proof for the existence of lim vλ and its characterization as the value of a one-shot game. When the probability of absorption (but not the payoff of absorption) is controlled by only one player (as for the big match of Gillette (1957)), the formula can be simplified to the value of an underlying finite game. From Coulomb (2001), one may also deduce an involved formula for lim vλ . Because his aim was different, he did not identify the associated asymptotic game and his approach cannot be extended to compact absorbing games. Actually, Coulomb’s work uses extensively the algebraic approach of Bewley and Kohlberg (1976a) which is only valid for the finite model. The minmax wλ of a multi-player λ-discounted absorbing game is the level at which a team of players could punish another player. Neyman (2003) proved the existence of the uniform minmax w∞ in finite absorbing games with full monitoring, implying in particular that lim wn = lim wλ = w∞ . However, no explicit formula exists in the literature for lim wλ and it is not known if this limit exists in infinite absorbing games. Our tools allow (1) a simple proof for the existence of the asymptotic minmax lim wλ of any multi-player compact and jointly-continuous absorbing game, and (2) an explicit formula for lim wλ . Some of the results may be extended to obtain equations that asymptotic equilibrium payoffs of a multi-player game should satisfy, as the discount factor goes to zero. Note that in the non-zero sum framework, there may be a fundamental incompatibility between equilibrium payoffs E λ of a discounted absorbing game λ as λ goes to zero and equilibrium payoffs E ∞ of the infinite game ∞ . Sorin (1986) in his famous Paris match, shows that lim E λ and E ∞ may be disjoint. Thus, our formula for lim E λ is not necessarily linked to E ∞ . However, the formula could be useful to prove the existence of uniform equilibria. Actually, Vrieze and Thuijsman (1989) for 2-player absorbing games and Solan (1999) for 3-player absorbing games constructed uniform equilibria (with threat) using lim E λ .
123
Explicit formulas for repeated games with absorbing states
57
In Sects. 2 and 3 the zero-sum game is studied. Section 4 provides formulas for the asymptotic minmax. Section 5 deals with Nash equilibria of a multi-player game. The last section extends the results of the previous sections, established for finite games, to compact and jointly-continuous games. 2 The value Consider two finite sets I and J , two (payoff) functions f , g from I × J to [−1, 1] and a (probability transition) function p from I × J to [0, 1]. The repeated game with absorbing states is played in discrete time as follows. At stage t = 1, 2, . . . (if the game is not yet absorbed) player I chooses i t ∈ I and, simultaneously, player J chooses jt ∈ J : (i) the payoff at stage t is f (i t , jt ); (ii) with probability 1 − p (i t , jt ) the game is absorbed and the payoff in all future stages s > t is g (i t , jt ); and (iii) with probability p (i t , jt ) the situation is repeated at step t + 1. If the stream of payoffs is r (t), t = 1, 2, . . . , the λ-discounted-payoff of the game is ∞ t−1 r (t). Player I maximizes the expected discounted-payoff and player t=1 λ(1 − λ) J minimizes that payoff. In the absorbing game described above, the game is over after absorption. One may define a general repeated game with absorbing states where after absorption, a zerosum repeated game in which the state never changes is reached. To play optimally in the discounted game after absorption, players should know the absorbing state that has been reached. A general absorbing game may be reduced to our more restrictive model by assuming that the absorbing payoff of an absorbing state is the value of the associated zero-sum game. Players are allowed to use behavioral strategies. If the game is not absorbed at stage t, player I may choose at random his action i(t) according to some probability distribution8 x(t) ∈ (I ). Similarly, player J chooses his action j (t) at random according to some probability distribution9 y(t) ∈ (J ). If a player obtains some information during the game, his behavioral strategy may depend, at each stage, on all his past information up this stage. As will be seen, Shapley (1953) proved that using such information is not necessary to play optimally in a discounted stochastic game: only the knowledge of the current state matters. Hence, in a discounted stochastic game, one needs only to assume that the states that the game visits are publicly known to the players, and that the description of the game is common knowledge. Denote by M+ (I ) = {α = (α i )i∈I : α i ∈ [0, +∞)} the set of positive measures on I (the I -dimensional positive orthant). For any i and j, let p ∗ (i, j) = 1 − p(i, j) M+ (I ) × J and ϕ : I × and f ∗ (i, j) = [1 − p(i, j)] × g(i, j). For any (α, j) ∈ J → [−1, 1] , ϕ is extended linearly as follows ϕ(α, j) = i∈I α i ϕ(i, j). Note that (I ) ⊂ M+ (I ).
8 (I ) = (x i ) i i i∈I : x ∈ [0, 1], i∈I x = 1 is the set probabilities over I .
9 (J ) = (y j ) j j j∈J : y ∈ [0, 1], j∈J y = 1 is the set of probabilities over J .
123
58
R. Laraki
Lemma 1 (Shapley 1953) The λ discounted game λ has a value, v (λ) . It is the unique real in [−1, 1] satisfying, v (λ) = max min λ f (x, j) + (1 − λ) p(x, j)v (λ) + (1 − λ) f ∗ (x, j) . x∈(I ) j∈J
(1)
The asymptotic value, v, is the limit of the discounted values v(λ) as λ goes to zero. The existence of such a limit is already known from Kohlberg (1974). The first main result provides a new proof for its existence and its characterization as the value of a one-shot game simply related to the data of the game. From Eq. 1 and a martingale argument it may be deduced that player I has an optimal stationary strategy (that is, he plays the same mixed action x at each period). This implies in particular that the lemma holds even if the players have no memory or do not observe past actions. Note that those properties are valid in every discounted stochastic game Shapley (1953) as soon as the states that the game visits are publicly known to the players. Example A quitting-game
Here I = J = {C, Q}. The game is absorbed with probability 1 if one of the players chooses Q (for Quitting) and continues with probability 1 if both players choose C (for Continue); that is p(C, C) = 1 and p(Q, C) = p(C, Q) = p(Q, Q) = 0. There are two absorbing payoffs 1 and 0 (marked with a ∗). The absorbing payoff 1 = g(C, Q) = g(Q, C) is achieved at some period if (C, Q) or (Q, C) is played. The absorbing payoff 0 = g(Q, Q) is achieved if (Q, Q) is played. The game is nonabsorbed with probability 1 if both players decide to continue and play (C, C). In that case, the stage payoff is 0 = f (C, C). Consider the following stationary strategy profile in which player I plays at each period (xC, (1 − x)Q) and player 2 (yC, (1 − y)Q) where x and y are in [0, 1]. x and y are the stationary probabilities to play C for players I and J respectively. The corresponding discounted payoff rλ (x, y) satisfies rλ (x, y) = x y (λ × 0 + (1 − λ) rλ (x, y)) + ((1 − x)y + (1 − y)x), so that rλ (x, y) =
x + y − 2x y . 1 − x y(1 − λ)
Hence, the value vλ ∈ [0, 1] satisfies:
x + y − 2x y x∈[0,1] y∈[0,1] 1 − x y(1 − λ) x + y − 2x y , = min max y∈[0,1] x∈[0,1] 1 − x y(1 − λ)
vλ = max min
123
Explicit formulas for repeated games with absorbing states
59
and it may be checked that √ 1− λ vλ = xλ = yλ = →λ→0 1. 1−λ The value and the optimal strategies are not rational fractions of λ (but admit a Puiseux series in power of λ). Bewley and Kohlberg (1976a) show, using algebraic tools, that this property holds for all finite stochastic games and deduce from it the existence of lim v (λ). Lemma 2 (Vrieze and Thuijsman 1989) v (λ) satisfies v (λ) = max min x∈(I ) j∈J
λ f (x, j) + (1 − λ) f ∗ (x, j) . λp(x, j) + p ∗ (x, j)
Proof If in the λ-discounted game, player I plays the stationary strategy x and player J plays a pure stationary strategy j ∈ J , the λ-discounted reward r (λ, x, j) satisfies: r (λ, x, j) = λ f (x, j) + (1 − λ) p(x, j)r (λ, x, j) + (1 − λ) f ∗ (x, j). Since 1 − (1 − λ) p(x) = 1 − p(x) + λp(x) = λp(x) + p ∗ (x), r (λ, x, j) =
λ f (x, j) + (1 − λ) f ∗ (x, j) . λp(x, j) + p ∗ (x, j)
The maximizer has a stationary optimal strategy and the minimizer has a pure stationary best reply: this proves the lemma. In the following, α ⊥ x means that for every i ∈ I , x i > 0 ⇒ α i = 0. Letting the discount factor tends to zero in the Vrieze and Thuijsman’s (1989) formula yields: Theorem 3 As λ goes to zero v (λ) converges to ∗ f (x, j) f (x, j)+ f ∗ (α, j) ∗ ∗ v = max 1 1 sup min + { p (x, j)>0} { p (x, j)=0} . x∈(I ) α⊥x∈M+ (I ) j∈J p ∗ (x, j) p(x, j) + p ∗ (α, j) The intuitive meaning of this formula is simple and is closely related to similar ideas in Coulomb (2001), Flech et al. (1996) and Vrieze and Thuijsman (1989) and others. x is the limit of the discounted optimal strategies x(λ) as λ → 0 and α is related to the second order action x(λ)−x λ . The max on x is achieved, the sup on α may not be attainable. Proof Let w = limn→∞ v (λn ) be an accumulation point of v (λ). Step 1: Consider an optimal stationary strategy x (λn ) for player I and go to the limit using Shapley’s dynamic programming principle. From Vrieze and Thuijsman’s (1989) formula, there exists x (λn ) ∈ (I ) such that for every j ∈ J, v (λn ) ≤
λn f (x(λn ), j) + (1 − λn ) f ∗ (x(λn ), j) . λn p(x(λn ), j) + p ∗ (x(λn ), j)
(2)
123
60
R. Laraki
By compactness of (I ) it may be supposed that x (λn ) → x. ∗ (x, j) Case 1: p ∗ (x, j) > 0. Letting λn go to zero implies w ≤ pf ∗ (x, j) . ∗ i ∗ ∗ Case 2: p (x, j) = i∈I x p (i, j) = 0. Thus, i∈S(x) p (i, j) = 0 where S(x) =
i {i ∈ I : x i > 0} is the support of x. Let α(λn ) = x λ(λn n ) 1{x i =0} ∈ M+ (I ) so that i∈I α(λn ) ⊥ x. Consequently, x i (λn ) x i (λn ) p ∗ (i, j) = p ∗ (i, j) λn λn i∈I i ∈S(x) / = α i (λn ) p ∗ (i, j) i∈I
= p ∗ (α (λn ) , j), and x i (λn ) f ∗ (i, j) = α i (λn ) f ∗ (i, j) = f ∗ (α (λn ) , j), λn i∈I
i∈I
so, from Eq. 2, and because p(x, j) = 1, w ≤ lim inf n→∞
f (x, j) + (1 − λn ) f ∗ (α(λn ), j) . p(x, j) + p ∗ (α(λn ), j)
(3)
Since J is finite, for any ε > 0, there is N (ε) such that, for every j ∈ J , w ≤ f (x, j)+ f ∗ (α(λ N (ε) ), j) p(x, j)+ p ∗ (α(λ N (ε) ), j) + ε. Consequently, w ≤ v. Step 2: Construct a strategy for player I proportional to x + λn α that guarantees v − ε in the λn -discounted game as λn → 0. Let (αε , xε ) ∈ M+ (I ) × (I ) be ε-optimal for the maximizer in the formula of v. For λn small enough, let xε (λn ) be proportional to xε + λn αε (that is, xε (λn ) = µn (xε + λn αε ) for some µn > 0). Let r (λn ) be the unique real in the interval [−1, 1] that satisfies,
λn [ f (xε (λn ), j)] + (1 − λn ) ( p(xε (λn ), j)) r (λn ) . r (λn ) = min + (1 − λn ) f ∗ (xε (λn ), j) j∈J
(4)
By the linearity of f , p, f ∗ and p ∗ on x, r (λn ) = min j
= min j
λn f (xε + λn αε , j) + (1 − λn ) f ∗ (xε + λn αε , j) λn p(xε + λn αε , j) + p ∗ (xε + λn αε , j) λn f (xε , j)+ λ2n f (αε , j)+(1−λn ) f ∗ (xε , j) + (1−λn ) λn f ∗ (αε , j) . λn p(xε , j)+λ2n p(αε , j)+ p ∗ (xε , j) + λn p ∗ (αε , j)
Also, v(λn ) ≥ r (λn ) since r (λn ) is the payoff of player I if he plays the stationary strategy xε (λn ). Let jλn ∈ J be an optimal stationary pure best response for player J against xε (λn ) (an element of the arg min in (4)). Since J is finite and r (λn ) bounded, one can
123
Explicit formulas for repeated games with absorbing states
61
switch to a subsequence and suppose that jλn is constant (= j) and that r (λn ) → r . ∗ (x , j) f (xε , j)+ f ∗ (αε , j) ε If p ∗ (xε , j) > 0 then r = pf ∗ (x . If p ∗ (xε , j) = 0, clearly r = p(x . ∗ ε , j) ε , j)+ p (αε , j) Consequently, w = lim v(λn ) ≥ v − ε. Step 2 of the above proof shows that for each ε > 0, there is xε and αε ⊥ xε such that player I always admits a 2ε-optimal strategy in the λ-discounted game proportional to xε + λαε for all λ small enough (so that |vλ − v| < ε). The quitting game example shows that a 0-optimal strategy of the λ-discounted game is not always of that form. This property permits to identify the asymptotic value as the value of what may be called the asymptotic game (following a terminology initiated in Sorin (2002)), that we now define. For any (α, β) ∈ M+ (I )× M+ (J ) and ϕ : I × J → [−1, 1] , ϕ is extended linearly by ϕ(α, β) = i∈I, j∈J α i β j ϕ(i, j). For player I, the strategy set in the asymptotic game is (I ) = {(x, α) ∈ (I ) × M+ (I ) : α ⊥ x} and similarly for player J , (J ) = {(y, β) ∈ (J ) × M+ (J ) : β ⊥ y}. The payoff function of the asymptotic game is: A(x, α, y, β) :=
f ∗ (x, y) f (x, y) + f ∗ (α, y) + f ∗ (x, β) ∗ (x,y)>0} + 1 1{ p∗ (x,y)=0}. { p p ∗ (x, y) p(x, y) + p ∗ (α, y) + p ∗ (x, β)
The two first formulas of the following corollary state that the asymptotic game has a value, and it is the same as the asymptotic value of the absorbing game. Corollary 4 v satisfies the following equations: v= =
sup
inf
A(x, α, y, β)
inf
sup
A(x, α, y, β)
(x,α)∈ (I ) (y,β)∈ (J ) (y,β)∈ (I ) (x,α)∈ (I )
=
sup
inf
(x,α)∈ (I ) y∈(J )
f ∗ (x,y) ∗ p ∗ (x,y) 1{ p (x,y)>0} f (x,y)+ f ∗ (α,y) + p(x,y)+ p∗ (α,y) 1{ p∗ (x,y)=0}
.
Proof Consider an ε-optimal strategy xε (λ) proportional to xε + λαε in the λ-discounted game. Taking any strategy of Player J proportional to y (λ) = y + λβ yields v (λ) − ε ≤
λ f (xε + λαε , y + λβ) + (1 − λ) f ∗ (xε + λαε , y + λβ) . λp(xε + λαε , y + λβ) + p ∗ (xε + λαε , y + λβ) ∗
(xε ,y) p ∗ (xε , y) > 0 implies v = lim v (λ) ≤ pf ∗ (x . If p ∗ (xε , y) = 0 then f ∗ (xε , y) = 0. ε ,y) ∗ ∗ Using the multi-linearity of f , f , p and p and dividing by λ imply:
f (xε + λαε , y + λβ) + (1 − λ) f ∗ (αε , y) + (1 − λ) f ∗ (xε , β) + (1 − λ) λ f ∗ (αε , β) . v (λ) − ε ≤ p(xε + λαε , y + λβ) + p ∗ (αε , y) + f ∗ (xε , β) + λp ∗ (αε , β)
123
62
R. Laraki
Going to the limit, v≤
f (xε , y) + f ∗ (αε , y) + f ∗ (xε , β) , p(xε , y) + p ∗ (αε , y) + f ∗ (xε , β)
which holds for all (y, β). Thus,
v≤
sup
inf
(x,α)∈ (I ) (y,β)∈ (J )
f ∗ (x,y) ∗ p ∗ (x,y) 1{ p (x,y)>0} f (x,y)+ f ∗ (α,y)+ f ∗ (x,β) + p(x,y)+ p∗ (α,y)+ p∗ (x,β) 1{ p∗ (x,y)=0}
.
And similarly for the other inequality. Since the inf sup is always higher than the sup inf the first two equalities follow. Taking β = 0 in the last inequality implies:
v≤
sup
inf
(x,α)∈ (I ) y∈(J )
f ∗ (x,y) ∗ p ∗ (x,y) 1{ p (x,y)>0} f (x,y)+ f ∗ (α,y) + p(x,y)+ p∗ (α,y) 1{ p∗ (x,y)=0}
,
and from the formula of v in Theorem 3, one obtains the last equality of the corollary. 3 Absorption controlled by one player Consider the following zero-sum absorbing game (the big-match) introduced by Gillette (1957).
Here, I = {T, B} and J = {L , R}. If Player I plays Top, the game is absorbed with probability 1 and if he plays Bottom, the game continues with probability 1. Absorbing payoffs are marked with a ∗ as in the quitting-game example. It is easy to show that v(λ) = 21 and that the unique optimal strategy for player I is to play T λ with probability 1+λ . Consequently, v = 21 which also happens to be the value of 10 the underlying one-shot game . On the other hand, the asymptotic value of the 01 01 quitting-game is 1, which is not the value of the underlying one-shot game . 10 A natural question arises: what are the absorbing games where v is the value of an underlying one-shot game? A game is partially-controlled by player I if the transition function p(i, j) depends only on i (but not the payoff functions). Proposition 5 If a zero-sum absorbing game is partially-controlled by player I, the asymptotic value equals the value of the underlying one-shot game, defined by: v = max min x∈(I ) j∈J
123
i ∈I / ∗
x i f (i, j) +
i∈I ∗
x i g(i, j)
Explicit formulas for repeated games with absorbing states
63
where I ∗ = {i : p ∗ (i) > 0} is the set of absorbing actions of player I. Coulomb (2001) proved a similar result for big-match games. Proof Step 1: v ≤ u. Let xε ∈ (I ) and αε ⊥ xε ∈ M+ (I ) be ε-optimal in the formula of v. If p ∗ (xε ) > 0 then,
∗ x i p ∗ (i) f (xε , j) = min g(i, j) v − ε ≤ min j∈J j∈J p ∗ (xε ) p ∗ (xε ) i∈I
x i p ∗ (i) = min g(i, j) ≤ max min z i g(i, j) ≤ u. j∈J z∈(I ∗ ) j∈J p ∗ (xε ) ∗ ∗ i∈I
If
p ∗ (x
ε)
= 0 then
xεi
= 0 for i ∈
i∈I
I∗
and
p ∗ (i)
= 0 when i ∈ / I ∗ so that
f (xε , j) + f ∗ (αε , j) j∈J p(xε ) + p ∗ (αε )
xεi αεi p ∗ (i) = min f (i, j) + g(i, j) . j∈J p(xε ) + p ∗ (αε ) p(xε ) + p ∗ (αε ) ∗ ∗
v − ε ≤ min
i ∈I /
i∈I
But when i ∈ / I ∗ , p(i) = 1, thus:
xεi p(i) αεi p ∗ (i) f (i, j) + g(i, j) v − ε ≤ min j∈J p(xε ) + p ∗ (αε ) p(xε ) + p ∗ (αε ) ∗ ∗ ≤ u.
i ∈I /
i∈I
Step 2: v ≥ u. Let x0 be optimal for player I in the one shot matrix game associated with u. Define (x1 , α1 ) as follows. If p ∗ (x0 ) = 0, let (x1 , α1 ) = (x0 , 0). This clearly implies that v ≥ u. If p ∗ (x0 ) > 0 then for all i ∈ I ∗ , let x1i = 0 (so that x1 is non-absorbing) and for i ∈ / I ∗ let α1i = 0. This will imply that, f (x1 , j) + f ∗ (α1 , j) v ≥ min j∈J p(x1 ) + p ∗ (α1 )
x1i α1i p ∗ (i) f (i, j) + g(i, j) = min j∈J p(x1 ) + p ∗ (α1 ) p(x1 ) + p ∗ (α1 ) i ∈I / ∗ i∈I ∗
x1i p(i) α1i p ∗ (i) = min f (i, j) + g(i, j) . j∈J p(x1 ) + p ∗ (α1 ) p(x1 ) + p ∗ (α1 ) ∗ ∗ i ∈I /
i∈I
Complete the definition of (x1 , α1 ) as follows. For i ∈ / I ∗ , let x0i = I \I ∗ ) and for i
proportional to x0 on the to x0 on I ∗ ). Consequently,
∈
I∗
let x0i
=
α1i p ∗ (i) p(x1 )+ p ∗ (α1 )
x1i p(i) p(x1 )+ p ∗ (α1 )
(x1 is
(α1 is proportional
123
64
R. Laraki
v ≥ min j∈J
i ∈I / ∗
x0i
f (i, j) +
i∈I ∗
x0i g(i,
j) = u.
4 The minmax A team of N players (named I) play against player (J). Assume the finiteness of all the strategy sets. Each player k in team I has a finite set of actions Ik . Player J has a finite set of actions J . Let I = I1 × · · · × I N and f , g from I × J → [−1, 1] and p : I × J → [0, 1] . The game is played as above, except that at each period, players in team I randomize independently (they are not allowed to correlate their random moves). Let = (I1 ) × · · · × (I N ), p ∗ (·) = 1 − p (·) , f ∗ (·) = p ∗ (·) × g (·) and M+ = M+ (I1 ) × · · · × M+ (I N ). For x ∈ X , j ∈ J, k ∈ N and α ∈ M+ , a function ϕ : I × J → [−1, 1] is extended multi-linearly as follows:
ϕ(x, j) =
1
N
1
k−1
i x1i × · · · × x N ϕ(i, j)
i=(i 1 ,...,i N )∈I
ϕ(αk , x−k , j) =
k
k+1
N
i i x1i × · · · × xk−1 × αki × xk+1 · · · × xni ϕ(i, j).
i=(i 1 ,...,i N )∈I
Let w (λ) be the maximum payoff that team I can guarantee against player J. From Bewley and Kohlberg (1976a) and Neyman (2003) one can deduce the existence of w (λ) . However, no explicit formula exists. Theorem 6 w (λ) = max x∈ min j∈J to ⎛ w = max
sup
min ⎝
x∈ α∈M+ :∀k,αk ⊥xk j∈J
= max
sup
and, as λ → 0, converges ⎞
f ∗ (x, j) ∗ (x, j)>0} p ∗ (x, j) 1{ p ⎠ N f (x, j)+ k=1 f ∗ (αk ,x−k , j) N + 1{ p∗ (x, j)=0} ∗ p(x, j)+ k=1 p (αk ,x−k , j)
⎛
min ⎝
x∈ α∈M+ :∀k,αk ⊥xk y∈(J )
λ f (x, j)+(1−λ) f ∗ (x, j) λp(x, j)+ p ∗ (x, j)
⎞
f ∗ (x,y) ∗ (x,y)>0} p ∗ (x,y) 1{ p ⎠. N f (x,y)+ k=1 f ∗ (αk ,x−k ,y) N + 1{ p∗ (x,y)=0} ∗ p(x,y)+ k=1 p (αk ,x−k ,y)
Proof For the first formula, follow the ideas in the proof of Theorem 3 and Corollary 4. Let v = limn→∞ w (λn ) where λn → 0. Modifications in step 1 in Theorem 3: let x (λn ) → x be such that for every j ∈ J, w (λn ) ≤
123
λn f (x(λn ), j) + (1 − λn ) f ∗ (x(λn ), j) . λn p(x(λn ), j) + p ∗ (x(λn ), j)
Explicit formulas for repeated games with absorbing states
65
Let y(λn ) = x (λn ) − x → 0 so that:
p ∗ (x(λn ), j) =
1
i=(i 1 ,...,i N )∈I
=
N
i x1i (λn ) × · · · × x N (λn ) p(i, j) 1
1
y1i (λn ) + x1i
N i iN × · · · × yN p(i, j) (λn ) + x N
i=(i 1 ,...,i N )∈I ∗
= p (x, j) +
N
∗
p (yk (λn ), x−k , j) + o
k=1
N
∗
p (yk (λn ), x−k , j)
k=1 ∗
(x, j) ∗ If p ∗ (x, j) > 0 then w ≤ pf ∗ (x, j) . If p (x, j) = 0 and if αk (λn ) = k xki (λn ) ∈ M+ (Ik ) then αk (λn ) ⊥ xk and λn 1 i k xk =0
p
∗
i k ∈Ik
x(λn ) ,j λn
=
N
∗
p (αk (λn ), x−k , j) + o
k=1
N
∗
p (αk (λn ), x−k , j)
k=1
and the same is true for f ∗ so that w ≤ lim inf
f (x, j) +
n→∞
p(x, j) +
N k=1
f ∗ (αk (λn ), x−k , j)
k=1
p ∗ (αk (λn ), x−k , j)
N
which implies that w ≤ v. Modifications in step 2 in Theorem 3: take (α ε , x ε ) to be ε-optimal for the maximizer in the formula of w and define xkε (λn ) to be proportional to xkε + λn αkε for every k then use the Taylor expansion above and step 2 of Theorem 3 to deduce that w ≥ v − ε. For the second formula, follow Corollary 4. For each ε > 0, the proof of the modification above will imply that players in I have an ε-optimal strategy ε of step 2 just xk (λ) k∈I where xkε (λ) is proportional to xkε + λαkε in the λ-discounted game for all λ small enough. This implies that for any y ∈ (J ), w (λ) + ε ≤
λ f (xkε (λ), y) + (1 − λ) f ∗ (xkε (λ), y) . λp(xkε (λ), y) + p ∗ (xkε (λ), y)
where the right hand is a fractional function of λ. Consequently, it admits a limit which may be computed as in step 1 (using the multi-linearity of payoffs and transitions). This will imply that ⎛ w ≤ max
sup
min ⎝
x∈ α∈M+ :∀k,αk ⊥xk y∈(J )
⎞
f ∗ (x,y) ∗ p ∗ (x,y) 1{ p (x,y)>0} N ⎠. f (x,y)+ k=1 f ∗ (αk ,x−k ,y) N + 1{ p∗ (x,y)=0} ∗ p(x,y)+ k=1 p (αk ,x−k ,y)
The first formula of w and the fact that J ⊂ (J ) imply the other inequality.
123
66
R. Laraki
5 Stationary Nash equilibria Consider a N player absorbing game where each player k ∈ N has a finite set of actions Ik . Define the payoff functions f k : I → [−1, 1] and gk : I → [−1, 1], k ∈ {1, . . . , N } and a probability transition p : I → [0, 1] where I = I1 × · · · × I N . The game is played as above except that if at stage t player k = 1, . . . , N chooses t , . . . , i t and if the game is absorbed i k receives f the action i kt ∈Ik then player k 1 N he receives gk i 1t , . . . , i Nt . From Fink (1964), the λ-discounted game admits a stationary Nash equilibrium. A calculus as in Vrieze and Thuijsman’s formula above may be used to establish that x(λ) ∈ , with the corresponding payoff u(λ) = (u 1 (λ), . . . , u N (λ)) ∈ R N , is a stationary equilibrium iff for every player k, Fink’s equations are satisfied: λ f k (xk , x−k (λ)) + (1 − λ) f k∗ (xk , x−k (λ)) xk ∈(Ik ) λp(xk , x−k (λ)) + p ∗ (xk , x−k (λ)) λ f k (xk , x−k (λ)) + (1 − λ) f k∗ (xk , x−k (λ)) , u k (λ) = max xk ∈(Ik ) λp(xk , x−k (λ)) + p ∗ (xk , x−k (λ)) xk (λ) ∈ arg max
The asymptotic game is defined as follows. The set of strategies of player k is: (Ik ) = {(xk , αk ) ∈ (Ik ) × M+ (Ik ) : αk ⊥ xk } and the payoff function of player k is: f k (x) + Nj=1 f k∗ (α j , x− j ) f k∗ (x) Ak (x, α) = ∗ 1{ p∗ (x)>0} + 1{ p∗ (x)=0} . p (x) p(x) + Nj=1 p ∗ (α j , x− j ) Theorem 7 Let u = (u 1 , . . . , u N ) ∈ [−1, 1] N be an accumulation point of u(λ). Then u is a limit equilibrium payoff of the asymptotic game. More precisely, there exists x and a sequence of measures α j (n) ⊥ x j , j = 1, . . . , N such that for every player k: u k = lim Ak (x, α(n)) n
f k (x) + Nj=1 f k∗ (α j (n), x− j ) f k∗ (x) ∗ 1{ p (x)>0} + 1{ p∗ (x)=0} = lim ∗ n p (x) p(x) + Nj=1 p ∗ (α j (n), x− j ) lim sup Ak (xk , αk , x−k , α−k (n)) ≥ sup (xk ,αk )∈ (Ik )
=
sup
(xk ,αk )∈ (Ik )
n
⎛
⎜ ⎝lim sup n
⎞
f k∗ (xk ,x−k ) p ∗ (xk ,x−k ) 1{ p ∗ (xk ,x−k )>0} ⎟ ⎠. f k (xk ,x−k )+ f k∗ (αk ,x−k )+ Nj=k f j∗ (α j (n),x− j ) + 1 ∗ N p (x ,x )=0 ∗ { } ∗ k −k p(xk ,x−k )+ pk (αk ,x−k )+ j=k p (α j (n),x− j )
One may ask: does any limit equilibrium payoff of the asymptotic game corresponds to the limit of some λn -discounted equilibrium payoff as λn goes to zero? The
123
Explicit formulas for repeated games with absorbing states
67
equations in the theorem and the proof below suggest that any limit equilibrium payoff of the asymptotic game is the limit of some n -equilibrium payoff of the λn -discounted game as n and λn go to zero. The strategy for player k in the λn -discounted game would be proportional to xk + λn αk (n). Proof Let x(λn ) ∈ be a stationary equilibrium of the λn -discounted absorbing game and let u(λn ) = (u 1 (λn ), . . . , u N (λn )) ∈ R N be its payoff and suppose w.l.o.g. that x(λn ) → x and u(λn ) → u. From Fink’s equations, one deduces that: u k (λn ) =
If
p ∗ (x)
> 0 then u k =
λn f k (x(λn )) + (1 − λn ) f k∗ (x(λn )) , λn p(x(λn )) + p ∗ (x(λn ))
f k∗ (x) p ∗ (x) . If
p ∗ (x)
= 0 define α j (n) =
j
x ij (λn ) λn 1 x i j =0
j
M+ (I j ), j = 1, . . . , N ; so that α j (n) ⊥ x j . Consequently, p∗
x(λn ) λn
=
N
∈ i j ∈I j
⎛ ⎞ N p ∗ (α j (n), x− j ) + o ⎝ p ∗ (α j (n), x− j )⎠
j=1
j=1
and the same is true for f ∗ . Thus, considering a subsequence if necessary, one obtains: u k = lim n
f k (x) + p(x) +
N
∗ j=1 f k (α j (n), x − j ) . ∗ j=1 p (α j (n), x − j )
N
Again, from Fink’s equations one deduces that for every αk ⊥ xk : u k (λn ) ≥
λn f k (xk + λn αk , x−k (λn )) + (1 − λn ) f k∗ (xk + λn αk , x−k (λn )) . λn p(xk + λn αk , x−k (λn )) + p ∗ (xk + λn αk , x−k (λn ))
Using multi-linearity and defining α j (n), j = k, as above proves that:
u k ≥ lim sup n
f k∗ (xk ,x−k ) p ∗ (xk ,x−k ) 1{ p ∗ (xk ,x−k )>0} . f k (xk ,x−k )+ f k∗ (αk ,x−k )+ Nj=k f j∗ (α j (n),x− j ) + 1 ∗ p(xk ,x−k )+ pk∗ (αk ,x−k )+ Nj=k p ∗ (α j (n),x− j ) { p (xk ,x−k )=0}
6 Compact continuous games Let us extend the model of zero-sum games. I and J are now assumed to be compact metric sets. The game is separately (resp. jointly) continuous if f , g and p are separately (resp. jointly) continuous functions on I × J . (K ), K = I, J, is the set of Borel probability measures on K and M+ (K ) is the set of Borel positive measure on
123
68
R. Laraki
K . They are endowed with the weak* topology. For (α, β) ∈ M+ (I ) × M+ (J ) and ϕ : I × J → [−1, 1] measurable, ϕ(α, β) = I ×J ϕ(i, j)dα(i)dβ( j). This framework was introduced in Rosenberg and Sorin (2001). Following the operator approach of Kohlberg (1974), Rosenberg and Sorin considered the Shapley operator r → (λ, r ) where (λ, r ) = max min λ f (x, y) + (1 − λ) p(x, y)r + (1 − λ) f ∗ (x, y) x∈(I ) y∈(J ) = min max λ f (x, y) + (1 − λ) p(x, y)r + (1 − λ) f ∗ (x, y) . y∈(J ) x∈(I )
The operator is well defined and the existence of the value is guaranteed via Sion’s minmax theorem. As Shapley (1953) already noticed, the operator is (1 − λ)-contracting so that the value of the λ-discounted game v(λ) is the unique fixed point (λ, ·). Kohlberg (1974), in finite absorbing games and Rosenberg and Sorin (2001) in compact and separately-continuous absorbing games proved the existence of v = lim v(λ) and provided a variational characterization of v using the information obtained from the derivative of (λ, r ) around λ ≈ 0. Notations for a multi-player absorbing game could be introduced similarly. Theorem 8 If the game is compact and jointly-continuous, all the results proved above for finite games still hold (for lim v(λ), lim w(λ) and Nash equilibria). Proof Let us show how the first part of Theorem 3 is modified. Let w = limn→∞ v (λn ) where λn → 0. Take an optimal strategy x(λn ) of player I in the λn -discounted game and suppose w.l.o.g. that it converges to some x. Consider any strategy j of Player J so that: v (λn ) ≤ If p ∗ (x, j) > 0 then v ≤
λn f (x(λn ), j) + (1 − λn ) f ∗ (x(λn ), j) λn p(x(λn ), j) + p ∗ (x(λn ), j) f ∗ (x, j) p ∗ (x, j) .
If p ∗ (x, j) = 0 then p ∗ (i, j) = 0 on i ∈ S(x), the
support of x. Define α(λn ) ∈ M+ (I ) to be dα(λn )(i) = d x(λλnn )(i) 1{i ∈S(x)} . Let sn ≥ 0 / be such that α(λn ) = sn σ (λn ) and σ (λn ) ∈ (I ) and assume w.l.o.g. that σ (λn ) → σ and sn → t ∈ [0, +∞] (by compactness of (I )). Using joint continuity, the fact that p(x, j) = 1 and that payoffs are uniformly bounded by 1 imply that for any ε > 0, there is N (ε) such that for all n ≥ N (ε) and all j ∈ J f (x, j) + ε + f ∗ (α(λn ), j) − λn f ∗ (α(λn ), j) f (x(λn ), j) + (1 − λn ) n ), j) ≤ ∗ p(x(λn ), j) + p (α(λn ), j) p(x, j) − ε + p ∗ (α(λn ), j) f (x, j) + f ∗ (α(λn ), j) 2ε ≤ + + λn p(x, j) + p ∗ (α(λn ), j) 1−ε f ∗ (α(λ
Consequently, w ≤ sup
sup
min
x∈(I ) α⊥x∈M+ (I ) j∈J
123
f ∗ (x, j) f (x, j)+ f ∗ (α, j) 1{ p∗ (x, j)>0} + 1{ p∗ (x, j)=0} . p ∗ (x, j) p(x, j)+ p ∗ (α, j)
Explicit formulas for repeated games with absorbing states
69
Step 2 of Theorem 1 needs no modification. The other proofs are adapted in a similar way. Acknowledgment I would like to thank the two referees, Michel Balinski, Abraham Neyman, Eilon Solan, Sylvain Sorin and Xavier Venel for their very useful comments.
References Abreu D, Milgrom P, Pearce D (1991) Information and timing in repeated partnerships. Econometrica 59:1713–1733 Aumann RJ, Maschler M (1995) Repeated games with incomplete information. MIT Press, Cambridge Bewley T, Kohlberg E (1976a) The asymptotic theory of stochastic games. Math Oper Res 1:197–208 Bewley T, Kohlberg E (1976b) The asymptotic solution of a recursion equation occuring in stochastic games. Math Oper Res 1:321–336 Blackwell D, Ferguson T (1968) The big match. Ann Math Stat 33:882–886 Coulomb JM (2001) Repeated games with absorbing states and signaling structure. Math Oper Res 26:286– 303 Fink AM (1964) Equilibrium in a stochastic N -person game. J Sci Hiroshima Univ 28:89–93 Flech J, Thuijsman F, Vrieze K (1996) Recursive repeated games with absorbing states. Math Oper Res 21:1016–1022 Gillette D (1957) Stochastic games with zero stop probabilities. In: Tucker AW, Dresher M, Wolf P (eds) Contributions to the theory of games, vol III. Annals of mathematical studies 39. Princeton University Press, Princeton, pp 179–187 Kohlberg E (1974) Repeated games with absorbing states. Ann Stat 2:724–738 Kohlberg E, Zamir S (1974) Repeated games of incomplete information: the symmetric case. Ann Stat 2:1040 Laraki R (2001a) The splitting game and applications. Int J Game Theory 30:359–376 Laraki R (2001b) Variational inequalities, system of functional equations, and incomplete information repeated games. SIAM J Control Optim 40(2):516–524 Mertens J-F, Neyman A (1981) Stochastic games. Int J Game Theory 10:53–66 Mertens J-F, Zamir S (1971) The value of two-person zero-sum repeated games with lack of information on both sides. Int J Game Theory 1:39–64 Mertens J-F, Neyman A, Rosenberg DD (2009) Absorbing games with compact action spaces. Math Oper Res 34:257–262 Neyman A (2003) Stochastic games: existence of the minmax. In: Neyman A, Sorin S (eds) Stochastic games and applications. Kluwer Academic Publishers, Dordrecht, pp 173–193 Neyman A, Sorin S (1998) Equilibria in repeated games with incomplete information: the general symmetric case. Int J Game Theory 27:201–210 Rosenberg D, Sorin S (2001) An operator approach to zero-sum repeated games. Isr J Math 121:221–246 Shapley LS (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100 Solan E (1999) Three-player absorbing games. Math Oper Res 24:669–698 Sorin S (1986) Asymptotic properties of a non zero-sum stochastic game. Int J Game Theory 15:101–107 Sorin S (2002) A first course on zero-sum repeated games. Springer, Berlin Vrieze K, Thuijsman F (1989) On equilibria in repeated games with absorbing states. Int J Game Theory 18:293–310
123