Information Delay in Games with Frequent Actions

Viewer
Transcript

Information Delay in Games with Frequent Actions David Rahman∗ University of Minnesota June 23, 2013

Abstract I study repeated games with frequent actions and obtain a Folk Theorem in strongly symmetric strategies for the Prisoners’ Dilemma if public information can be delayed. I allow for both Brownian motions and Poisson processes. JEL Classification: D21, D23, D82. Keywords: folk theorem, information delay, frequent actions.

∗

Telephone number: +1 (612) 625 3525. E-mail address: [email protected]. I thank the European University Institute for hospitality during the Fall of 2012 and the National Science Foundation for financial support through Grant No. SES 09-22253.

I study the repeated Prisoners’ Dilemma with imperfect public monitoring and frequent actions. I argue that changing the information structure so that players obtain it in lumps overturns the impossibility results of Sannikov and Skrzypacz (2007, 2010). To this end, first, I show that the approach due to Abreu et al. (1990) fails to provide incentives in the frequent actions model. Then I show that a version of Kandori and Matsushima (1998), despite delivering a Folk Theorem in discrete time, fails to do so in the continuous time limit with Brownian information. Finally, I offer an intermediate approach that delivers (virtually) full cooperation as players become unboundedly patient. In Rahman (2013), I extend these intermediate incentive schemes to derive a Folk Theorem without exogenously delaying information. Finally, at the end of this note I also derive a Folk Theorem when information follows a Poisson process.

Prisoners’ Dilemma Consider the Prisoners’ Dilemma with imperfect public monitoring, repeated in discrete time, with common discount factor δ and the bi-matrix of flow payoffs below. C D

C u, u u + ∆u, −B

D −B, u + ∆u 0, 0

Let u > ∆u − B. There are two publicly observed signals, g and b, with conditional probabilities determined by 0 < Pr(b|C, C) = p < q = Pr(b|C, D) = Pr(b|D, C) < 1. Strongly symmetric public equilibrium payoffs, v, solve (see, e.g., Abreu et al., 1990): δ v = (1 − δ)u + δ[(1 − p)v + p(1 − α)v] ⇔ v = u − pαv, 1−δ 1 − δ ∆u v ≥ (1 − δ)(u + ∆u) + δ[(1 − q)v + q(1 − α)v] ⇔ αv ≥ , δ q−p where α ∈ [0, 1] is the probability of mutual defection henceforth. Maximize v by changing v and α subject to these constraints. At an optimum, incentive compatibility will bind; otherwise α may be decreased to increase v. Substituting yields ∆u v =u− , ∆` where ` = q/p and ∆` = ` − 1. The feasibility constraint that 0 ≤ αv ≤ v implies ∆u 1−δ u≥ 1+ . ∆` δp Therefore, to attain the value v = u − ∆u/∆` it is necessary for δ to be large enough for this inequality to be satisfied. Clearly, such δ exists strictly between 0 and 1. 1

Abreu et al. (1990) Fails with Frequent Actions Abreu et al. (1990) proposed an argument in discrete time that allowed players to approach full efficiency as they grew patient, and thereby overcome the incentive cost of ∆u/∆` above. Their insight was to delay the arrival of information, so instead of the public signal arriving every period, no information arrives until the end of a T -period block, where all the information arrives at once. Their equilibrium construction relies on the following strategies: Start with mutual cooperation for the entire T -period block. If at the end of the block the vector bT is observed then mutual defection henceforth will occur with probability α. Otherwise, the cooperative phase just described continues for another block. Now, T -public equilibrium symmetric payoffs can be found similarly as above: v = (1 − δ T )u + δ T [(1 − pT )v + pT (1 − α)v]

⇔

v =u−

δT pT αv. 1 − δT

Incentive compatibility requires one more argument. First, assume that a player deviates only in the first period. The gain from such a deviation is (1 − δ)∆u. The cost of the deviation is δ T ∆`pT αv. So α must satisfy (1 − δ)∆u ≤ δ T ∆`pT αv. It is easy to see that if this constraint is satisfied then all the incentive constraints in the T -period block are satisfied. Indeed, for τ deviations, the gain is at most (1 − δ)τ ∆u, whereas the cost is δ T (`τ − 1)pT αv. Since ` > 1, it follows that `τ − 1 ≥ τ (` − 1), which implies incentive compatibility with respect to τ deviations from incentive compatibility with respect to a single one. In other words, discouraging one deviation discourages all deviations. Recognizing that this constraint will bind at an optimum and rearranging yields v =u−

1 − δ ∆u . 1 − δ T ∆`

Just as before, the feasibility constraint that 0 ≤ αv ≤ v implies that 1 − δT ∆u 1 − δ + T T , u≥ ∆` 1 − δ T δ p

(1)

(2)

It is easy to see that for sufficiently large T and δ the feasibility constraint above is satisfied. Therefore, v → u − T1 ∆u as δ → 1 for large fixed T . Now, as T → ∞, ∆` v → u, which delivers a Folk Theorem in discrete time, as long as it is possible to delay information. Abreu et al. (1990) has the remaining details. 2

This argument fails with frequent actions, as Proposition 1 below shows. To see why, let δ = e−r∆t , where r > 0 is the players’ common discount rate and ∆t is the time interval between interactions. For a discrete-time Folk Theorem, the limiting set of equilibrium payoffs is calculated as r → 0 for fixed ∆t, whereas in the case of frequent actions r is fixed and the relevant limit is ∆t → 0. However, ` generally depends on ∆t but not on r. To complete the model, therefore, we must specify the way in which p, q and ` depend on ∆t. Abreu et al. (1990) already studied the case where these parameters converge to Poisson processes. Arguably, the remaining interesting case is when the limiting stochastic processes are Brownian motions. For this reason, define p and q in terms of the random walk representation of Brownian √ √ motion: p = 12 [1 − (x/η) ∆t], and q = 12 [1 − (y/η) ∆t], where x > y are the drifts of the Brownian motion that players observe and η > 0 is the volatility parameter. Proposition 1. Punishment after T failures induces no cooperation with frequent actions and Brownian information: v → 0 as ∆t → 0, even if T depends on ∆t. √ Proof. Our assumptions on p and q imply that ∆` ≈ C ∆t for small ∆t > 0 and some constant C ∈ R. Feasibility requires satisfaction of (2) above, which in turn implies that δ T → 1 as ∆t → 0.1 Hence, since 1 − e−x ≤ x for x > 0 but 1 − e−x ≈ x for x > 0 small, the first term on the right-hand side of (2) is estimated by: 1 − δ ∆u r∆t ∆u r∆t ∆u 1 ∆u √ √ √ . ≈ ≥ = (3) T T 1 − δ ∆` 1 − δ C ∆t r∆tT C ∆t T C ∆t This term roughly corresponds to the efficiency loss from punishing bad news. From this expression it follows that efficiency losses explode unless T → ∞ as ∆t → 0. It √ might seem that, as long as T ∆t → ∞, these losses can be made arbitrarily small, leading to full cooperation. However, this violates feasibility. Indeed, from the second term on the right-hand side of (2), since p → 12 as ∆t → 0, it follows that δp ≤ 2/3 for large enough T , so √ √ 1 − δ T ∆u 1 − δ ∆u r ∆t rT ∆t 1 ≥ T T ≈ (3/2)T = (3/2)T . (4) T T δ p ∆` δ p ∆` C C T √ Without loss, taking a convergent subsequence (perhaps to ∞) if necessary, T ∆t has limit λ ∈ [0, ∞]. If λ > 0, the right-hand side of (4) explodes, since T1 ( 32 )T → ∞. If λ = 0, the right-hand side of (3) explodes. Either way, feasibility fails. 1

Otherwise, since δp < 1 and T ≥ 1, the second term on the right-hand side of (2), ∆u 1 − δ T ∆u ∆u ≥ (1 − δ T ) ≈ √ (1 − δ T ), ∆` δ T pT ∆` C ∆t

would clearly explode as ∆t → 0, and feasibility would fail.

3

Empirical Likelihood has Limited Success There is another way of managing information that “partially survives” the frequent actions limit as long as players are sufficiently patient. This is in stark contrast with the previous subsection, where nothing survived regardless of how patient players were. It is related to an example in Kandori and Matsushima (1998). As ∆t → 0, let T = dc/∆te, for some constant c to be specified later. Intuitively, c is the fixed calendar time of the T -period block. Count the number of g signals and b signals over a T -period block, and let the equilibrium strategies be as follows. Players plan mutual cooperation over the entire block. At the end of the block, if the number of b-signals is greater than or equal to bpT c,2 then mutual punishment is triggered with probability α. Otherwise, there is no punishment, the signal realizations of the block are forgotten, and the next block begins afresh. The (symmetric) lifetime average payoff to a player is now v =u−

δT αvΠ0 , 1 − δT

where Π0 is the probability of triggering punishment when everyone cooperated. Letting t∗ = d(1 − p)T e, Π0 may be written as t∗ X T T −t Π0 = p (1 − p)t . t t=0 Incentive compatibility, requires that any number of deviations between 1 and T de discouraged. It will be convenient to describe the deviations in terms of the fraction of deviating periods. For any fraction ρ ∈ {0, 1/T, 2/T, . . . , 1} of periods, let Πρ denote the probability of punishment when a player deviates ρT times. This is the probability of t∗ or fewer successes from T independent, but not identically distributed Bernoulli trials: ρT of them have failure probability q, whereas (1 − ρ)T of them have failure probability p. Hoeffding (1956) shows that t∗ X T T −t Πρ ≥ p (1 − p)t =: Πρ , (5) t t=0 where p = ρq + (1 − ρ)p is the arithmetic mean of each of the Bernoulli trial probabilities under consideration. The function Πρ just defined extends immediately to all ρ ∈ [0, 1], with an interpretation of mixed strategies, and clearly Π0 = Π0 . 2

The notation bzc describes the greatest integer less than or equal to z; similarly, dze stands for the smallest integer greater than or equal to z.

4

Discouraging ρT deviations requires that their gain, bounded above by (1 − δ ρT )∆u, be outweighed by their loss, δ T αv∆Πρ , where ∆Πρ = Πρ − Π0 . Using (5) and letting ∆Πρ = Πρ − Π0 , this is clearly implied by the inequality (1 − δ ρT )∆u ≤ δ T αv∆Πρ . Substituting for T = c/∆t, and recognizing that 1 − δ ρT ≤ rcρ (moreover, when ∆t is small, 1 − δ ρT ≈ rcρ), we obtain the following incentive constraint: rc∆u ≤ δ T αv∆Πρ /ρ.

(6)

The left-hand side of this incentive constraint is independent of ρ. The right-hand side depends on ρ only through the term ∆Πρ /ρ. From this term we can derive the most binding incentive constraint, because, as I argue next, ∆Πρ /ρ is monotone in ρ. Lemma 1. Πρ is increasing and concave. Hence, ∆Πρ /ρ is minimized at ρ = 1. Proof. The proof is straightforward. The first derivative of Πρ is easily seen to equal 0 Πρ

t∗ X T = (q − p) [(T − t)pT −1−t (1 − p)t − tpT −t (1 − p)t−1 ] t t=0 t∗ X T − 1 T −1−t T − 1 T −t t p (1 − p) − p (1 − p)t−1 = (q − p)T t t − 1 t=0 T − 1 T −1−t∗ = (q − p)T p (1 − p)t∗ > 0, t∗

therefore Πρ is increasing. Similarly, the second derivative equals T −1 00 2 Πρ = (q − p) T [(T − 1 − t∗ )pT −2−t∗ (1 − p)t∗ − t∗ pT −1−t∗ (1 − p)t∗ −1 ] t∗ T − 1 T −2−t∗ 2 = (q − p) T p (1 − p)t∗ −1 [(T − 1 − t∗ )(1 − p) − t∗ p] t∗ T − 1 T −2−t∗ 2 = (q − p) T p (1 − p)t∗ −1 [(T − 1)(1 − p) − t∗ ] t ∗ T − 1 T −2−t∗ = (q − p)2 T p (1 − p)t∗ −1 [(T − 1)(1 − p) − d(1 − p)T e] t∗ T − 1 T −2−t∗ 2 ≤ (q − p) T p (1 − p)t∗ −1 [(T − 1)(1 − p) − (1 − p)T ] t∗ T − 1 T −2−t∗ 2 ≤ (q − p) T p (1 − p)t∗ −1 [(T − 1)(p − p) − (1 − p)] < 0 t∗ because p ≤ p. Therefore, Πρ is concave.

Figure 1 below provides intuition for Lemma 1. 5

Π1 → Φ( x−y c) η

Π 0 → Φ(0)

ρ=0

ρ =1 ρ =1/ 2

x−y η

c

√ Figure 1: Cumulative normal has lowest slope between 0 and ρ x−y c at ρ = 1. η It follows from Lemma 1 that it is not the case, unlike the equilibrium construction of Abreu et al. (1990), that discouraging one deviation implies that all other deviations are discouraged. In fact, the most tightly binding incentive constraint is that meant to discourage T deviations, corresponding to ρ = 1. Hence, if T deviations are discouraged then ρT deviations are also discouraged for all ρ less than 1, so satisfying (6) just for ρ = 1 guarantees incentive compatibility for all ρ ∈ (0, 1]. In trying to maximize v, players’ symmetric payoffs, this tightest incentive constraint will bind. Substituting for αv then yields v =u−

rc∆u Π0 , −rc 1 − e Π1 − Π0

since Π1 = Π1 . As before, this v is feasible as long as α ≤ 1, therefore ∆u rc rcΠ0 u≥ + . Π1 − Π0 e−rc 1 − e−rc

(7)

To compute the limit of v as ∆t → 0, we invoke the Central Limit Theorem. Lemma 2. For any ρ ∈ [0, 1], √ Πρ → Φ(ρ x−y c) η

as

∆t → 0,

where Φ is the standard normal cumulative distribution function. Proof. Given ρ ∈ [0, 1], let {Xt } be a family of iid Bernoulli trials with failure

6

probability p. By the Central Limit Theorem, as ∆t → 0, Πρ = Pr(

P

t Xt

≤ d(1 − p)T e) ≈ Pr(

P

t Xt

≤ (1 − p)T )

√ ! P Xt − (1 − p) (p − p) T = Pr ≤ p tp p(1 − p)T p(1 − p) √ ! ρ x−y c P Xt − (1 − p) 2η ≈ Pr ≤p tp p(1 − p)T p(1 − p) x−y √ → Φ(ρ η c), p using the fact that T = dc/∆te and p(1 − p) → 21 .

√ By Lemma 2, it follows that Π0 → 12 and Π1 → Φ( x−y c). It is easy to see that, η therefore, feasibility would fail if T ∆t → 0 or ∞ instead of c ∈ (0, ∞). The incentive cost of this equilibrium construction converges, as ∆t → 0, to 1 rc∆u 2 . √ 1 1 − e−rc Φ( x−y c) − η 2

Although for fixed r > 0 feasibility bounds c from above, as r → 0 it is possible to make c unboundedly large, leading to an infimal incentive cost of ∆u. Proposition 2. The best symmetric equilibrium payoff v from punishing bpT c or more failures, when actions become frequent and players become patient, converges to ( u − ∆u if u > ∆u lim lim v = r→0 ∆t→0 0 otherwise. Proof. By Lemma 2, as ∆t → 0 the right-hand side of (7) converges to 1 rc ∆u rc 2 F (r, c) := + . √ Φ( x−y c) − 21 e−rc 1 − e−rc η √ c) − 12 ) as r → 0, and given ε > 0 For each c, clearly F (r, c) → F (c) := 21 ∆u/(Φ( x−y η there exists r > 0 such that F (r, c) − F (c) < ε for all r < r. Moreover, F (c) → ∆u √ c) → 1, and given ε > 0 there exists c < ∞ such that as c → ∞, since Φ( x−y η F (c) − ∆u < ε for all c > c. If u > ∆u then there exists ε > 0 such that u − ∆u ≥ 2ε for all 0 < ε < ε. Letting c > c and r < r, it follows that u − F (r, c) = u − F (r, c) + F (c) − F (c) + ∆u − ∆u > u − 2ε − ∆u ≥ 0, implying feasibility (7). Since ε > 0 was arbitrarily small, the result follows. 7

By Proposition 2, if u is big enough relative to ∆u, it is possible to obtain a symmetric equilibrium outcome that is strictly better than static Nash, and it is still the case that more outcomes are attainable as players become patient. However, we remain uniformly bounded away from full cooperation, even as players become patient. To see why, notice that the tightest incentive constraint is the one associated with deviating every period, and as such the equilibrium incentive cost does not become negligible.

Intermediate Approach Let us now consider an approach to providing incentives that lies somewhere in between Abreu et al. (1990) and Kandori and Matsushima (1998). I will amend the previous subsection’s construction as follows. Just as the previous subsection, let T = dc/∆te, for some constant c to be specified later. Count the number of g realizations and b realizations over a T -period block, and let the equilibrium strategies be as follows. Players plan mutual cooperation over the entire block. At the end of the block, if the number of b-signals is greater than or equal to dqT e (instead of bpT c), then mutual punishment is triggered with probability α. Otherwise, there is no punishment, the signal realizations of the block are discarded, and the next block begins afresh. The key difference here is that punishment takes place after more failures than the arrangement of Kandori and Matsushima (1998, Section 5), but fewer than that of Abreu et al. (1990, Section 4). However, as ∆t → 0, q − p → 0, so the difference between the number of failures required in this subsection versus the √ previous subsection vanishes, albeit slowly (at rate ∆t). I will now show that discouraging just one deviation is enough to discourage them all, unlike the previous subsection. Moreover, I will prove a Folk Theorem: as players become (unboundedly) patient, equilibrium payoffs reflect full cooperation, so v → u. The lifetime average payoff to a player is now δT v = u− αvΠ∗0 , 1 − δT

∗

where

Π∗0 =

t X T t=0

t

pT −t (1 − p)t

and

t∗ = b(1 − q)T c.

As in the previous section, consider a player’s plan to deviate. Let Π∗ρ be the probability of at most t∗ successes given ρT defections. Boland et al. (2004) shows that ∗

Π∗ρ

t X T T −t ∗ ≥ p˜ (1 − p˜)t =: Πρ , t t=0

8

where p˜ = q ρ p1−ρ is the geometric mean of each of the Bernoulli trial probabilities under consideration. By the same calculations as those in the previous subsection, ∗ ∗ letting ∆Πρ = Πρ − Π∗0 , we obtain the following incentive constraint: ∗

rc∆u ≤ δ T αv∆Πρ /ρ. Therefore, just as before, the tightest incentive constraint is that associated with the ∗ fraction ρ that minimizes ∆Π0 /ρ. This is found in the next result. ∗

∗

Lemma 3. Πρ is increasing and convex. Hence, m(ρ) = ∆Πρ /ρ is minimized at ρ = 0, where m(0) is defined as the limit of m(ρ) as ρ → 0. ∗

Proof. The proof is straightforward. The first derivative of Πρ is easily seen to equal ∗

∗0 Πρ

t X T [(T − t)˜ pT −1−t (1 − p˜)t − t˜ pT −t (1 − p˜)t−1 ] = p˜ ln(q/p) t t=0 t∗ X T − 1 T −1−t T − 1 T −t t = p˜ ln(q/p)T p˜ (1 − p˜) − p˜ (1 − p˜)t−1 t t−1 t=0 T − 1 T −1−t∗ T ∗ ∗ t∗ ∗ = p˜ ln(q/p)T p˜ (1 − p˜) = ln(q/p)(T − t ) ∗ p˜T −t (1 − p˜)t > 0, ∗ t t ∗

since d˜ p/dρ = p˜ ln(q/p). Therefore, Πρ is increasing. Similarly, ∗00 Πρ

2

= p˜ ln(q/p) (T = p˜ ln(q/p)2 (T = p˜ ln(q/p)2 (T ≥ p˜ ln(q/p)2 (T = p˜ ln(q/p)2 (T

T ∗ ∗ ∗ ∗ − t ) ∗ [(T − t∗ )˜ pT −t −1 (1 − p˜)t − t∗ p˜T −t (1 − p˜)t −1 ] t T ∗ ∗ − t∗ ) ∗ p˜T −t −1 (1 − p˜)t −1 [(T − t∗ )(1 − p˜) − t∗ p˜] t T ∗ ∗ − t∗ ) ∗ p˜T −t −1 (1 − p˜)t −1 [T (1 − p˜) − t∗ ] t T ∗ ∗ − t∗ ) ∗ p˜T −t −1 (1 − p˜)t −1 [T (1 − p˜) − T (1 − q)] t T ∗ ∗ − t∗ ) ∗ p˜T −t −1 (1 − p˜)t −1 T (q − p˜) ≥ 0, t ∗

∗

because q ≥ p˜. Therefore, Πρ is convex.

Figure 2 above provides intuition for Lemma 3, which shows that discouraging one deviation discourages all deviations. For fixed T , satisfying the limiting incentive constraint as ρ → 0 implies incentive compatibility for all ρ ∈ [0, 1].

9

ρ =1 Π1 →1− Φ(0)

ρ =1/ 2 ρ=0

Π 0 →1− Φ( x−y c) η x−y η

c

√ Figure 2: Cumulative normal has lowest slope from (1 − ρ) x−y c to η

x−y √ c η

at ρ = 0.

In trying to maximize v, this tightest incentive constraint will necessarily bind. Substituting for αv yields rc∆u Π∗0 v =u− . 1 − e−rc Π∗0 0 As in previous subsections, this is feasible if rcΠ∗0 ∆u rc u ≥ ∗0 −rc + . 1 − e−rc Π0 e

(8)

The limit of v as ∆t → 0 may now be computed similarly to the previous subsection. Lemma 4. For any ρ ∈ [0, 1], √ ∗ Πρ → 1 − Φ((1 − ρ) x−y c) and η

√ x−y √ ∗0 Πρ → ϕ((1 − ρ) x−y c) η c η

as

∆t → 0,

where ϕ is the standard normal probability density function. Proof. Given ρ ∈ [0, 1], let {Xt } be a family of iid Bernoulli trials with failure probability p˜. By the Central Limit Theorem, as ∆t → 0, P P ∗ Πρ = Pr( t Xt ≤ b(1 − q)T c) ≈ Pr( t Xt ≤ (1 − q)T ) √ ! P Xt − (1 − p˜) (˜ p − q) T = Pr ≤p tp p˜(1 − p˜)T p˜(1 − p˜) √ ! P Xt − (1 − p˜) (p − q) T ≈ Pr ≤p tp p˜(1 − p˜)T p˜(1 − p˜) √ ! − 21 (1 − ρ) x−y c P Xt − (1 − p˜) η p p ≈ Pr ≤ t p˜(1 − p˜)T p˜(1 − p˜) √ → 1 − Φ((1 − ρ) x−y c), η 10

p using the fact that p˜ ≈ p for large T ,3 T = dc/∆te and p˜(1 − p˜) → 12 . Similarly, T T −t∗ T ∗ ∗ ∗0 ∗ t∗ (1 − p˜) ≈ qT ln(q/p) ∗ p˜T −t (1 − p˜)t Πρ = ln(q/p)(T − t ) ∗ p˜ t t √ √ T T −t∗ ∗ T ln(q/p) q T ∗ p˜ (1 − p˜)t . = | {z } t | {z } (a) (b)

By the de Moivre-Laplace Theorem and previous derivations, (b) above satisfies √ T T −t∗ q [t∗ − T (1 − p˜)]2 t∗ (1 − p˜) ≈ p q T ∗ p˜ exp − t 2T p˜(1 − p˜) 2π p˜(1 − p˜) q [T (˜ p − q)]2 = p exp − 2T p˜(1 − p˜) 2π p˜(1 − p˜) ( √ 21 ) [(1 − ρ) x−y c] 4 T q η exp − ≈ p 2T p˜(1 − p˜) 2π p˜(1 − p˜) n o √ 1 x−y √ 2 1 √ → exp − 2 [(1 − ρ) η c] = ϕ((1 − ρ) x−y c). η 2π √ Finally, by taking a first-order Taylor series expansion of q/p with respect to ∆t √ around ∆t = 0, it follows that (a) satisfies, as ∆t → 0, √

T ln(q/p) ≈

√

c ln

1− 1−

√

y ∆t η √ x ∆t η

! √1

∆t

≈

√

as required.

c ln 1 −

x−y η

√ √1∆t ∆t →

x−y √ c, η

√ ∗0 By Lemma 4, it follows that Π∗0 /Π0 → (1 − Φ(z))/(ϕ(z)z), where z = x−y c. This η limit determines the incentive cost of the proposed intermediate scheme. It is well known that the normal distribution has a linearly exploding hazard rate, therefore (1 − Φ(z))/(ϕ(z)z) → 0 as c → ∞. This suggests the possibility of sustaining full cooperation as r → 0. Proposition 3. The best symmetric equilibrium payoff from punishing dqT e or more failures, when actions become frequent and players become patient, converges to the value of full cooperation: lim lim v = u. r→0 ∆t→0

√ √ Specifically, the first-order Taylor series expansion of p˜ with respect to ∆t around ∆t = 0 is √ √ easily shown to equal 21 (1 − [ρ ηy + (1 − ρ) xη ] ∆t) = p. Therefore, p˜ − q ≈ − 21 (1 − ρ) x−y ∆t. η 3

11

Proof. By Lemma 4, as ∆t → 0 the right-hand side of (8) converges to ∆u rc rc(1 − Φ(z)) G(r, c) := . + ϕ(z)z e−rc 1 − e−rc For each c, clearly G(r, c) → G(c) := ∆u(1−Φ(z))/(ϕ(z)z) as r → 0, and given ε > 0 there exists r > 0 such that G(r, c) − G(c) < ε for all r < r. Moreover, G(c) → 0 as c → ∞, and given ε > 0 there exists c < ∞ such that G(c) < ε for all c > c. If u > 0 then there exists ε > 0 such that u ≥ ε for all 0 < ε < ε. Letting c > c and r < r, u − G(r, c) = u − G(r, c) + G(c) − G(c) > u − ε ≥ 0, implying feasibility (8). Since ε > 0 was arbitrarily small, the result follows.

Proposition 3 shows that, with the right punishment scheme, delaying the arrival of information alleviates the incentive problem in the Prisoners’ Dilemma with Brownian information so much that full cooperation is asymptotically possible as long as players are sufficiently patient, even in the continuous-time limit. The main reason is that we were able to design a scheme such that the tightest incentive constraint involved deviating only once in an arbitrarily large T -period block. To summarize: information delay yields asymptotic efficiency in repeated games with frequent actions as players become unboundedly patient. The order of limits matters in this statement: first ∆t → 0 and then r → 0. Taking limits in the opposite order is the standard approach of discrete-time repeated games, whereas taking limits the way I just did is consistent with, e.g., Sannikov (2007); Sannikov and Skrzypacz (2007, 2010); Faingold and Sannikov (2011). All these papers argue that a Folk Theorem is not possible in continuous time when value-burning is necessary. However, the argument above suggests that this is indeed possible. In Rahman (2013) I extend the arguments of this paper and apply them to the study of communication equilibria in repeated games, where the information that is delayed is the mediator’s endogenous recommendations rather than players’ exogenous signal.

Poisson Arrivals Abreu et al. (1990) showed that if information follows a Poisson process and this information is “bad news” then some positive value may be attainable with frequent actions, but not in the “good news” case. This is because in the bad news case ∆` does not vanish as ∆t → 0. 12

For completeness, in a simple application of the broad approach above, I show next that delaying information easily yields full efficiency for sufficiently patient players in either case. Indeed, suppose that the signal arrives according to a Poisson process with arrival rate λn , where n is the number of cooperators. The probability of k arrivals in the time interval [0, c) equals Pr(k|n) =

(λn c)k e−λn c . k!

The maximum likelihood ratio of arrivals between two cooperators and just one equals ( e(λ2 −λ1 )c if λ1 ≤ λ2 , and Pr(k|1) sup = ∞ if λ1 > λ2 . k Pr(k|2) If it were possible to hide arrivals until the end of a period of calendar length c then the Folk Theorem could easily be restored (except when λ1 = λ2 , of course). First, suppose that λ2 > λ1 , and consider a block of length c. If there is an arrival then continuation values do not change, but if no signal arrives then both players will revert to mutual defection henceforth with probability α to be determined. Hence, on the equilibrium path strongly symmetric equilibrium payoffs equal v =u−

δT αvΠ00 , 1 − δT

where Π00 = Pr(0|2). Let Π0ρ = e−[λ1 ρ+λ2 (1−ρ)]c = e−λ2 c e(λ2 −λ1 )ρc be the punishment probability if a player defects during a fraction ρ of the time. Following the previous derivations, incentive compatibility is implied by rc∆u ≤ δ T αv∆Π0ρ /ρ, where ∆Π0ρ = Π0ρ − Π00 . Clearly, Π0ρ is convex in ρ, since λ2 > λ1 by hypothesis, therefore the tightest incentive constraint is given by ρ → 0, that is rc∆u ≤ δ T αv(λ2 − λ1 )ce−λ2 c . Feasibility of this mechanism therefore follows from ∆u rc rce−λ2 c u≥ + . (λ2 − λ1 )ce−λ2 c e−rc 1 − e−rc

(9)

For fixed c, the terms inside the brackets of (9) converge to e−λ2 c as r → 0, so the right-hand side tends to 0. This yields an easy Folk Theorem—its proof is omitted. 13

Proposition 4. If λ1 < λ2 then lim lim v = u.

r→0 ∆t→0

Moreover, a Folk Theorem is still attainable if λ1 > λ2 . To see this, let Πk0 = Pr(k|2) and assume that players revert to permanent mutual defection with probability αk after k arrivals. Write λ = ρλ1 + (1 − ρ)λ2 , and Πkρ the probability of k arrivals after deviating a fraction ρ of the time in a block of length c. Lemma 5. If λ1 > λ2 then for every c ∈ (0, ∞) there exists k ∈ N sufficiently large that Πkρ is strictly increasing and strictly convex in ρ. Proof. The proof is easy. Writing Πkρ = xk e−x /k!, where x = λc, it follows that 1 k−1 −x kx e − xk e−x (λ1 − λ2 )c k! 1 = (k − x)xk−1 e−x (λ1 − λ2 )c, k! which is clearly positive for sufficiently large k. Therefore, Πkρ can be made strictly increasing. For concavity, taking the next derivative yields 1 k−1 −x Π00kρ = −x e + (k − x)(k − 1)xk−2 e−x − (k − x)xk−1 e−x ((λ1 − λ2 )c)2 k! 1 = [−x + (k − x)(k − 1) − (k − x)x] xk−2 e−x ((λ1 − λ2 )c)2 k! 1 = (k − x)2 − k xk−2 e−x ((λ1 − λ2 )c)2 , k! which again is clearly positive for large enough k, so Πkρ is strictly convex, too. Π0kρ =

By convexity, the tightest incentive constraint again corresponds to ρ → 0. Proposition 5. If λ1 > λ2 then also lim lim v = u.

r→0 ∆t→0

Proof. Similarly to previous exercises, feasibility requires that rcΠkρ ∆u rc + , u≥ 0 Πkρ e−rc 1 − e−rc where, given c, k is large enough that Πkρ is convex. It remains to show that Πkρ /Π0kρ → 0 as k to ∞. But this follows because Πkρ /Π0kρ = x/[(k−x)(λ1 −λ2 )c] → 0 as k → ∞. Hence, for any given c, if players are sufficiently patient then the above arrangement is feasible and incurs a negligible incentive cost. 14

References Abreu, D., P. Milgrom, and D. G. Pearce (1990): “Information and Timing in Repeated Partnerships,” Econometrica, 59, 1713–33. 1, 2, 3, 6, 8, 12 Boland, P., H. Singh, and B. Cukic (2004): “The stochastic precedence ordering with applications in sampling and testing,” Journal of applied probability, 41, 73– 82. 8 Faingold, E. and Y. Sannikov (2011): “Reputation in Continuous-Time Games,” Econometrica, 79, 773–876. 12 Hoeffding, W. (1956): “On the distribution of the number of successes in independent trials,” The Annals of Mathematical Statistics, 27, 713–721. 4 Kandori, M. and H. Matsushima (1998): “Private Observation, Communication, and Collusion,” Econometrica, 66, 627–652. 1, 4, 8 Rahman, D. (2013): “Frequent Actions with Infrequent Coordination,” Working paper. 1, 12 Sannikov, Y. (2007): “Games with imperfectly observable actions in continuous time,” Econometrica, 75, 1285–1329. 12 Sannikov, Y. and A. Skrzypacz (2007): “Impossibility of collusion under imperfect monitoring with flexible production,” The American Economic Review, 97, 1794–1823. 1, 12 ——— (2010): “The role of information in repeated games with frequent actions,” Econometrica, 78, 847–882. 1, 12

15

Frequent Actions with Infrequent Coordination