Three Steps Ahead
Yuval Heller Nuffield College & Department of Economics, Oxford
UCLA, January 2014
Yuval Heller (Oxford)
Three Steps Ahead
1 / 32
Introduction
Background Evidence suggests that people have systematic deviations from payoff maximizing behavior; these biases have economic implications. I
In some cases, the biases cannot be attributed to complexity costs.
I
Substantial heterogeneity of the biases in the population.
Research approach: I
People base their choices on heuristics (rules of thumb).
I
Different heuristics “compete” in a process of cultural learning.
I
Understanding how biases survive these competitive forces can help to achieve better understanding of the biases and their implications.
Yuval Heller (Oxford)
Three Steps Ahead
2 / 32
Introduction
Limited Foresight Stylized facts: I
People look only few steps ahead in long games.
I
Some subjects systematically look fewer steps than others.
Examples: repeated Prisoner’s Dilemma (Selten & Stoecker, 1986), Centipede games (McKelvey & Palfrey, 1992), sequential bargaining (Johnson et al., 2002).
Related bias: Limited iterative thinking in one-stage games (“level-k”).
Question 1
How do the naive agents survive?
2
Why is not there an “arms race” for better foresight? Yuval Heller (Oxford)
Three Steps Ahead
3 / 32
Introduction
Research Objectives 1
Characterizing a stable state in which all agents have short foresight abilities, and some agents look further than others.
2
Under which additional assumptions this stable state is unique?
3
Novel explanation for cooperative behavior in long finite games. I
All types have the same maximal payoff (unlike Kreps et al., 1982).
Yuval Heller (Oxford)
Three Steps Ahead
4 / 32
Model
Outline 1
Model
2
Stability Analysis
3
Characterization of a Nash Equilibrium
4
Evolutionary Stability
5
Uniqueness
6
Discussion
Yuval Heller (Oxford)
Three Steps Ahead
5 / 32
Model
Overview Large population of agents interact in repeated Prisoner’s Dilemma. Agent’s type determines: (1) foresight ability, (2) behavior. Foresight abilities are partially observable. More successful types become more frequent (payoff-monotonic dynamics): cultural learning or biological evolution. I
E.g., replicator dynamics - # of offspring is proportional to payoffs.
Rarely a few agents experiment with a new type (“mutants”). Objective: characterize stable states of the population in the long run. Yuval Heller (Oxford)
Three Steps Ahead
6 / 32
Model
Types and Populations
The type of each agent has two components: I
Foresight Ability: {L1 , L2 , ..., Lk , ...} - how early the agent becomes aware of the final period of the game and its strategic implications.
I
Behavior in the repeated Prisoner’s Dilemma. F
In which situations the agent cooperates, and in which he defects.
State of the population: A distribution over the set of types. I
Incumbents - types with positive frequency.
Yuval Heller (Oxford)
Three Steps Ahead
7 / 32
Model
Repeated Prisoner’s Dilemma Agents are randomly matched and play repeated Prisoner’s Dilemma. I
The game has a random length - T (Geometric distribution)
I
At each round there is a continuation probability δ close to 1.
The payoffs at each stage are (A > 1):
C
Yuval Heller (Oxford)
C
A
D
A+1
D A 0
0 1
Three Steps Ahead
A+1 1
8 / 32
Model
Information Structure Agent with ability Lk is informed about the realized length k rounds before the end. I
Horizon - the number of remaining stages.
Partial observability of abilities (a la Dekel, Ely, Yilankaya, 2007): I
Each agent observes with probability p the opponent’s ability.
I
With probability 1 − p: non-informative signal - opponent is a stranger.
All signals are private.
Yuval Heller (Oxford)
Three Steps Ahead
9 / 32
Stability Analysis
Outline 1
Model
2
Stability Analysis
3
Characterization of a Nash Equilibrium
4
Evolutionary Stability
5
Uniqueness
6
Discussion
Yuval Heller (Oxford)
Three Steps Ahead
10 / 32
Stability Analysis
Static Stability Analysis Payoff monotonic population dynamics
Static Auxiliary Game
Random matching in a single population
Symmetric 2-player game
Feasible types
Feasible Actions
State of the population
Mixed strategy
Necessary condition for stability
Symmetric Nash equilibrium
Stable state in a broad set of dynamics
Evolutionary stable strategy
Stability is robust to the presence of sophisticated agents. Nash (1950 thesis), Maynard-Smith & Price (1973), Taylor and Jonker (1978), Thomas (1985), Bomze (1986), Cressman (1990, 1997), Sandholm (2010) ... Yuval Heller (Oxford)
Three Steps Ahead
11 / 32
Stability Analysis
Auxiliary Static game
Stage 0: Each player chooses ability - {L1 , L2 , ..., Lk , ...}. I
At the end of stage 0: Partial observability of types.
Stages 1-T: I
Each player decides whether to cooperate or defect at each stage.
I
Player with ability Lk is informed about the realized length k rounds before the end.
Yuval Heller (Oxford)
Three Steps Ahead
12 / 32
Stability Analysis
Strategies In the Auxiliary Game (Behavior) strategy - σ = (µ, β ): µ - Distribution over abilities. I
β (I ) - probability of defection at each information set (playing rule).
Each information set has 4 components: 1
Foresight ability of the player (Lk ).
2
Horizon - how many rounds remain (if known).
3
The signal about opponent’s ability.
4
Public history of actions in the previous rounds.
u (σ , σ 0 ) - the expected payoff of a player who follows strategy σ and faces an opponent who follows strategy σ 0 . Yuval Heller (Oxford)
Three Steps Ahead
13 / 32
Characterization of a Nash Equilibrium
Outline 1
Model
2
Stability Analysis
3
Characterization of a Nash Equilibrium
4
Evolutionary Stability
5
Uniqueness
6
Discussion
Yuval Heller (Oxford)
Three Steps Ahead
14 / 32
Characterization of a Nash Equilibrium
Definition (σ ∗ = (µ ∗ , b∗ )) Two abilities: µ ∗ (L3 ) =
1 p·(A−1) ,
µ ∗ (L1 ) = 1 − µ ∗ (L3 )
Deterministic simple behavior - b∗ : Everyone plays perfect-tit-for-tat at unknown horizons.
I
F
Defect iff players played different actions in the previous round.
I
L1 agents: defect at the last round.
I
L3 agents: defect at the last two rounds. F
Horizon 3: perfect-tit-for-tat against strangers & L1 ; defect otherwise.
Theorem σ ∗ is a symmetric Nash equilibrium ∀
A (A−1)2
A−1 A
Yuval Heller (Oxford)
(A = 10: 10% < p < 90% ). Three Steps Ahead
15 / 32
Characterization of a Nash Equilibrium
Intuition Why σ ∗ is an equilibrium If p is not too low, a heterogeneous population is stable: I
L1 fares better against L3 opponent (“commitment” induces cooperation at horizon 3).
I
L3 fares better against L1 opponent (defects at horizon 2).
I
“Hawk-dove”-like game: unique frequency balances the payoffs.
Details
if p is not too high, there is no “arms race” for higher abilities: I
The optimal play of L>3 agents is to mimic L3 ’s play.
Details
L2 is strictly dominated by L3 (worse payoff against L3 , same against L1 ). Yuval Heller (Oxford)
Three Steps Ahead
16 / 32
Evolutionary Stability
Outline 1
Model
2
Stability Analysis
3
Characterization of a Nash Equilibrium
4
Evolutionary Stability
5
Uniqueness
6
Discussion
Yuval Heller (Oxford)
Three Steps Ahead
17 / 32
Evolutionary Stability
Evolutionarily Stable Strategy Nash equilibrium: (1) allow arbitrary play off-equilibrium path; and (2) may be dynamically unstable.
Definition (Maynard-Smith & Price, 1973) A symmetric Nash equilibrium σ is an evolutionarily stable strategy (ESS) if for each other best-reply σ 0 : u (σ , σ 0 ) > u (σ 0 , σ 0 ).
Interpretation: If σ is adopted by the population, it cannot be invaded by any alternative strategy (σ 0 ) that is initially rare. Yuval Heller (Oxford)
Three Steps Ahead
18 / 32
Evolutionary Stability
Limit ESS ESSs almost never exist in repeated games due to equivalent strategies, which only differ off the equilibrium path. There is no ESS in repeated Prisoner’s Dilemma (Lorberbaum, 1994).
Definition (Selten, 1983) σ is a limit ESS if it is a limit of ESSs of perturbed games when the perturbations converge to 0. Perturbed game: game with minimal probabilities to choose each action a at each information set h. Yuval Heller (Oxford)
Three Steps Ahead
19 / 32
Evolutionary Stability
Stability Result Assumption: better abilities have increasing cognitive costs - c (Lk ).
Theorem If c (L4 ) > c (L3 ), then σ ∗ is a limit-ESS.
(∀
A (A−1)2
A−1 A )
Remark σ ∗ is stable for any converging sequence of perturbed games. Without c (Lk ): the set of strategies similar to σ ∗ in which L≥3 mimic L3 ’s behavior is evolutionary stable (Thomas, 1985).
Question Is σ ∗ the only stable strategy? Yuval Heller (Oxford)
Three Steps Ahead
20 / 32
Evolutionary Stability
“Folk-Theorem” Result Proposition (all types and all rates of cooperation are stable) For any p > 0, Lk and r , if δ is sufficiently high, then there exists a limit ESS in which µ (Lk ) = 1, and players cooperate with frequency close to r .
Sketch of proof: I
Lk vs. Lk : follow a cycle with cooperation frequency of r when uninformed, defect otherwise.
I
Lk vs. Lk 0 : a different cycle that yields more to Lk and less to Lk 0 .
Always defecting is a stable outcome for each p, δ (& unique if p = 0). Is σ ∗ unique in a plausible subset of stable strategies? Yuval Heller (Oxford)
Three Steps Ahead
21 / 32
Uniqueness
Outline 1
Model
2
Stability Analysis
3
Characterization of a Nash Equilibrium
4
Evolutionary Stability
5
Uniqueness
6
Discussion
Yuval Heller (Oxford)
Three Steps Ahead
22 / 32
Uniqueness
Early-Niceness
Early Niceness Definition Strategy is early-nice if each player cooperates when: (1) the horizon is unknown or sufficiently large; and (2) no one ever defected before.
Remark 1
Focus on “nice” incumbents; no restrictions on mutants.
2
Equivalent definition: efficiency + non-discrimination against mutants: 1
Efficient play at large horizons; also if one of the players “trembles” and chooses a different ability.
Further motivation
2
Motivation for efficiency: “secret handshake” (Robson, 1990).
3
Fits experimentally observed behavior (e.g., Selten & Stoecker, 86).
Yuval Heller (Oxford)
Three Steps Ahead
23 / 32
Uniqueness
Result
Theorem (Uniqueness of σ ∗) Let A > 3. Any early-nice limit ESS is realization equivalent to σ ∗ (=may only differ off the equilibrium path.) Sketch of proof: 1/2
2/2
Weaker solutions concepts
Intuition. Let Lk be the lowest incumbent ability. Everyone must defect during the last k rounds. If only Lk : “mutants” with ability Lk+1 invade. If Lk & Lk+1 : Lk is outperformed. If Lk & L≥k+3 : unstable to invasions of abilities in between. If k > 1 : “mutants” with ability L1 invade. Yuval Heller (Oxford)
Three Steps Ahead
24 / 32
Uniqueness
Yuval Heller (Oxford)
Graphical Representation of Results
Three Steps Ahead
25 / 32
Discussion
Outline 1
Model
2
Stability Analysis
3
Characterization of a Nash Equilibrium
4
Evolutionary Stability
5
Uniqueness
6
Discussion
Yuval Heller (Oxford)
Three Steps Ahead
26 / 32
Discussion
Extensions 1
Having a far-sighted L∞ ability.
2
Allowing players to send false signals:
3
I
“Cheap-talk” - always defecting is the unique outcome.
I
Results can be extended to a setup with costly lies: F
At stage 0, player chooses true ability and deception effort.
F
Efforts determine the probability of observing opponent’s true ability.
A setup with several games: if games in which looking far ahead decreases efficiency (like Centipede, social dilemma games) are sufficiently frequent. Yuval Heller (Oxford)
Three Steps Ahead
27 / 32
Discussion
Related Literature (1)
Level-k evolution in one-shot games (Stahl, 93; Stennek, 00; Mohlin, 12). I
My model: 0 < p < 1 leads to a qualitatively different result.
Uncertainty about the final period (Samuelson, 1987; Neyman, 1999) and limited foresight (Jehiel, 2001; Mengel, 2012) can induce cooperation in repeated Prisoner’s Dilemma. I
My model: Limited foresight is part of the result (not an assumption).
Yuval Heller (Oxford)
Three Steps Ahead
28 / 32
Discussion
Related Literature (2)
Evolution of preferences (Guth & Yaari, 1992; Dekel et al. 2007): high level of observability can lead to a homogenous population of cooperative players. I
My model: moderate partial observability induces a heterogeneous population of cooperators and shirkers.
Other related papers: Complex sequential problems (Geanakoplos and Gray, 1991), Co-existence of sophisticated & naive agents (Crawford, 2003).
Yuval Heller (Oxford)
Three Steps Ahead
29 / 32
Discussion
Question Why uncertain horizon that becomes certain when reaching foresight?
Interpretation: the “physical” interaction is finite; horizon is infinite when the last period is not part of the strategic considerations. “A key criterion that determines whether we should use a model with a finite or an infinite horizon is whether the last period enters explicitly into the players’ strategic considerations.” (Osborne & Rubinstein, 94)
Similar results can be obtained in a model with a fixed length. Yuval Heller (Oxford)
Three Steps Ahead
30 / 32
Discussion
Question Why do we interpret Lk as limited foresight? Alternative “myopic” notion: L0k evaluates longer games as having horizon k.
How bounded agents play long zero-sum games (e.g., chess)? I
Bounded Minimax algorithm: look several steps ahead, and use a heuristic function to evaluate non-final positions.
In non-zero-sum repeated games: position = history. I
The myopic notion uses a constant evaluation.
I
Evaluations should be history-dependent (& simple).
I
In my model: the evaluation relies on the infinite-horizon benchmark.
Yuval Heller (Oxford)
Three Steps Ahead
31 / 32
Discussion
Conclusion
Summary Moderate partial observability induces a stable heterogeneous population of agents who look a few steps ahead and cooperate until the last few rounds.
Everyone obtains the same maximal payoff. Efficient play at early stages implies uniqueness. Insights can be applicable to other biases (e.g., Level-k).
Yuval Heller (Oxford)
Three Steps Ahead
32 / 32
Discussion
Conclusion
(Informal) Motivation for Early Niceness: 1
10 rounds of PD in the lab: most subjects defect only at the last few rounds (Selten & Stoecker, 86; Anderoni & Miller, 93; Cooper et al., 96).
2
Tournaments of PD among algorithms (Axelrod, 84; Wu & Axelrod, 95).
3
Robson (90) - “secret-handshake mutants” can take a population from an inefficient ESS to an efficient ESS.
4
Early cooperation is the unique outcome in related setups: I
Finite long repeated PD with a few “crazy” players (Kreps et al., 82).
I
δ = 1 and small noise / complexity costs (Fudenberg & Maskin, 90; Binmore & Samuelson, 92).
Yuval Heller (Oxford)
Three Steps Ahead
Back 33 / 32
Discussion
Conclusion
Sketch of Proof – 1/2 (Theorem 1): L1 & L3 play the following reduced game (given b∗ ): (only payoffs at horizons 2-3 are presented; other payoffs are the same.)
L1
L3 2·A
L1
2·A
L3
2·A+1
f (p) < A ⇔ p >
1 A−1
A
A
2·A+1
f (p)
f (p) & p
f (p)
⇔ Hawk-Dove game.
µ ∗ (L3 ) - unique frequency balances the payoffs of L1 & L3 . Yuval Heller (Oxford)
Three Steps Ahead
Back
34 / 32
Discussion
Conclusion
Sketch of Proof – 2/2 (Theorem 1): b∗ is a best-reply for all abilities: I
Uncertain horizon: Lorberbaum et al. (2002).
I
Against strangers (µ ∗ (L1 ) > 1/A ⇔ p > F
I
A ): (A−1)2
Defection at horizon 3 is better against L3 and worse against L1 .
Against observed L3 (p <
A−1 A ):
Defection at horizon 4 is better
(worse) against an observing (unobserving) opponent.
Other abilities cannot yield higher payoffs: I
L2 is strictly dominated by L3 (worse payoff against L3 , same against L1 ).
I
L>3 - can’t improve L3 ’s optimal play.
Yuval Heller (Oxford)
Three Steps Ahead
Back 35 / 32
Discussion
Conclusion
Sketch of Proof 1/2 (Theorem 2)
p > ... ⇒The smallest incumbent ability must be L1 . Assume that all incumbents cooperate at horizons >2 against cooperative strangers. I
p < ... ⇒ all incumbents must cooperate at horizons > 3 against all cooperative opponents.
I I
Incumbents must only include L1 and L3 ⇒ equivalent to σ ∗ . Back
Yuval Heller (Oxford)
Three Steps Ahead
36 / 32
Discussion
Conclusion
Sketch of Proof 2/2 (Theorem 2) Assume the opposite: some incumbents defect at horizons >2 against cooperative strangers. ⇒ µ (L1 ) <
2 A+1
⇒ µ (L>2 ) <
1 A·p
⇒p > ...⇒ µ (L2 ) > 0 (otherwise, L1 outperforms) (otherwise, L2 are outperformed by L1 ).
⇒ Everyone cooperates at horizons > 3 against cooperative strangers. ⇒ Everyone cooperates at horizons > 4 . Balanced payoffs imply a unique frequency of abilities: L1 ,L2 & L≥4 . Unstable to small “group” perturbations. Yuval Heller (Oxford)
Three Steps Ahead
Back 37 / 32
Discussion
Conclusion
Uniqueness Results for Weaker Solution Concepts
1
Symmetric perfect equilibrium → heterogeneous population of L1 and a subset of {L2 , L3 , L4 }.
2
Neutrally stable strategy → a (possibly) shifted σ ∗ of Lk and Lk+2 .
3
Perfect + Normally stable → σ ∗ is unique. Back
Yuval Heller (Oxford)
Three Steps Ahead
32 / 32