Three Steps Ahead
Yuval Heller Nuffield College & Department of Economics, Oxford
January 2014
Yuval Heller (Oxford)
Three Steps Ahead
1 / 23
Introduction
Background Evidence suggests that people have systematic deviations from payoff maximizing behavior; these biases have economic implications. I
In some cases, the biases cannot be attributed to complexity costs.
I
Substantial heterogeneity of the biases in the population.
Research approach: I
People base their choices on heuristics (rules of thumb).
I
Different heuristics “compete” in a process of cultural learning.
I
Understanding how biases survive these competitive forces can help to achieve better understanding of the biases and their implications.
Yuval Heller (Oxford)
Three Steps Ahead
2 / 23
Introduction
Limited Foresight Stylized facts: I
People look only few steps ahead in long games.
I
Some subjects systematically look fewer steps than others.
Examples: I
Finitely repeated Prisoner’s Dilemma (Selten & Stoecker, 1986).
I
Sequential bargaining (Johnson et al., 2002).
Question 1
How do the naive agents survive?
2
Why is not there an “arms race” for better foresight? Yuval Heller (Oxford)
Three Steps Ahead
3 / 23
Introduction
Research Objectives 1
Characterizing a stable state in which all agents have short foresight abilities, and some agents look further than others.
2
Under which additional assumptions this stable state is unique?
3
Novel explanation for cooperative behavior in long finite games. I
All types have the same maximal payoff (unlike Kreps et al., 1982).
Yuval Heller (Oxford)
Three Steps Ahead
4 / 23
Model
Outline
1
Model
2
Characterization of a Stable State
3
Uniqueness
4
Discussion
Yuval Heller (Oxford)
Three Steps Ahead
5 / 23
Model
Evolutionary Dynamics Large population of agents interact in repeated Prisoner’s Dilemma. Agent’s type determines: (1) foresight ability, (2) behavior. Foresight abilities are partially observable. More successful types become more frequent (payoff-monotonic dynamics): cultural learning or biological evolution. I
E.g., replicator dynamics - # of offspring is proportional to payoffs.
Rarely a few agents experiment with a new type (“mutants”). Objective: characterize stable states of the population in the long run. Yuval Heller (Oxford)
Three Steps Ahead
6 / 23
Model
Types and Populations
The type of each agent has two components: I
Foresight Ability: {L1 , L2 , ..., Lk , ...} - how early the agent becomes aware of the final period of the game and its strategic implications.
I
Behavior in the repeated Prisoner’s Dilemma F
In which situations the agent cooperates, and in which he defects.
State of the population: A distribution over the set of types. I
Incumbents - types with positive frequency.
Yuval Heller (Oxford)
Three Steps Ahead
7 / 23
Model
Stable States Definition (a la Maynard-Smith & Price, 1973) A state is stable if it satisfies 3 properties: 1
Balance: All incumbents get the same expected payoff.
2
Internal stability: If one fraction of the population becomes more frequent, its payoff decreases.
3
External stability: if a “mutant” type enters the population, it is outperformed.
Yuval Heller (Oxford)
Three Steps Ahead
8 / 23
Model
Game Agents are randomly matched and play repeated Prisoner’s Dilemma. The game has a random length - T (Geometric distribution) I
At each round there is a continuation probability δ close to 1.
The payoffs at each stage are (A > 1):
C
Yuval Heller (Oxford)
C
A
D
A+1
D A 0
0 1
Three Steps Ahead
A+1 1
9 / 23
Model
Information Structure Agent with ability Lk is informed about the realized length k rounds before the end. I
Horizon - the number of remaining stages.
Partial observability of abilities (a la Dekel, Ely, Yilankaya, 2007): I
Each player observes with probability p the opponent’s ability.
I
With probability 1 − p: non-informative signal - opponent is a stranger.
All signals are private.
Yuval Heller (Oxford)
Three Steps Ahead
10 / 23
Characterization of a Stable State
Outline
1
Model
2
Characterization of a Stable State
3
Uniqueness
4
Discussion
Yuval Heller (Oxford)
Three Steps Ahead
11 / 23
Characterization of a Stable State
Definition (Population state σ ∗ ) Two incumbent abilities: L1 and L3 ; Frequency of L3 is
1 p ·(A−1) .
Deterministic simple behavior: I
Everyone plays perfect-tit-for-tat at unknown horizons. F
Defect iff players played different actions in the previous round.
I
L1 agents: defect at the last round.
I
L3 agents: defect at the last two rounds. F
Horizon 3: perfect-tit-for-tat against strangers & L1 ; defect otherwise.
Theorem A A−1 σ ∗ is a stable population state ∀ (A−1) (A = 10: 10% < p < 90% ). 2
Yuval Heller (Oxford)
Three Steps Ahead
12 / 23
Characterization of a Stable State
Intuition Why σ ∗ is Stable Internal stability holds if p is not too low: I
L1 fares better against L3 opponent (“commitment” induces cooperation at horizon 3).
I
L3 fares better against L1 opponent (defects at horizon 2).
I
Unique frequency balances the payoffs and induces internal stability (“Hawk-dove”-like game).
Details
External stability holds if p is not too high: I
The optimal play of L>3 agents is to mimic L3 ’s play.
I
L2 is strictly dominated by L3 (worse payoff against L3 , same against L1 ).
Yuval Heller (Oxford)
Three Steps Ahead
Details
13 / 23
Uniqueness
Outline
1
Model
2
Characterization of a Stable State
3
Uniqueness
4
Discussion
Yuval Heller (Oxford)
Three Steps Ahead
14 / 23
Uniqueness
Many States are Stable (“Folk-Theorem”) Proposition (all types and all rates of cooperation are stable) For any p, k, r , if δ is close to 1, then there exists a stable state in which all incumbents have ability Lk and they cooperate with frequency close to r .
Stability relies on “discriminating” against the mutants. I
Lk vs. Lk : uninformed: cooperate with frequency r ; informed: defect.
I
Lk vs. Lk 0 : a different cycle that yields more to Lk and less to Lk 0 .
Is σ ∗ unique in a plausible subset of stable strategies?
Yuval Heller (Oxford)
Three Steps Ahead
15 / 23
Uniqueness
Early-Niceness
Definition A state is early-nice if each incumbent cooperates when: (1) the horizon is unknown or sufficiently large; and (2) no one ever defected before.
Remark Focus on “nice” incumbents; no restrictions on mutants. Equivalent definition: efficiency + non-discrimination against mutants: I
Efficient play at large horizons; also if one of the players “trembles” and chooses a different ability.
I
Further motivation
Motivation for efficiency - “secret handshake” (Robson, 1990).
Fits experimentally observed behavior (e.g., Selten & Stoecker, 86). Yuval Heller (Oxford)
Three Steps Ahead
16 / 23
Uniqueness
Result
Theorem (Uniqueness of σ ∗) Let A > 3. Any early-nice stable state is realization equivalent to σ ∗ (=unique frequency of types & observable behavior; only behavior following zero-probability events may differ.)
Sketch of proof: 1/2
2/2
Intuition. Let Lk be the lowest incumbent ability. Everyone must defect during the last k rounds. If only Lk : “mutants” with ability Lk+1 invade. If Lk & Lk+1 : Lk is outperformed. If Lk & L≥k+3 : unstable to invasions of abilities in between. If k > 1 : “mutants” with ability L1 invade. Yuval Heller (Oxford)
Three Steps Ahead
17 / 23
Uniqueness
Yuval Heller (Oxford)
Graphical Representation of Results
Three Steps Ahead
18 / 23
Discussion
Outline
1
Model
2
Characterization of a Stable State
3
Uniqueness
4
Discussion
Yuval Heller (Oxford)
Three Steps Ahead
19 / 23
Discussion
Extensions 1
Having a far-sighted L∞ ability.
2
Allowing players to send false signals:
3
I
“Cheap-talk” - always defecting is the unique outcome.
I
Results can be extended to a setup with costly lies: F
Each player chooses deception effort and fake ability.
F
Efforts determine the probability of observing opponent’s true ability.
A setup with several games: if games in which looking far ahead decreases efficiency (like Centipede, social dilemma games) are sufficiently frequent. Yuval Heller (Oxford)
Three Steps Ahead
20 / 23
Discussion
Related Literature (1) Kreps et al. (1982): Few “crazy” tit-for-tat players can induce cooperation in the finitely repeated Prisoner’s dilemma. I
My model: Everyone obtains the same maximal payoff.
Evolution of preferences (Guth & Yaari, 1992; Dekel et al. 2007): high level of observability can lead to a homogenous population of cooperative players. I
My model: moderate partial observability induces a heterogeneous population of cooperators and shirkers.
Yuval Heller (Oxford)
Three Steps Ahead
21 / 23
Discussion
Related Literature (2)
Level-k evolution in one-shot games (Stahl, 93; Stennek, 00; Mohlin, 12). I
My model: 0 < p < 1 leads to a qualitatively different result.
Uncertainty about the final period (Samuelson, 1987; Neyman, 1999) and limited foresight (Jehiel, 2001; Mengel, 2012) can induce cooperation in repeated Prisoner’s Dilemma. I
My model: Limited foresight is part of the result (not an assumption).
Yuval Heller (Oxford)
Three Steps Ahead
22 / 23
Discussion
Conclusion
Summary Moderate partial observability induces a stable heterogeneous population of agents who look a few steps ahead and cooperate until the last few rounds. Efficient play at early stages implies uniqueness. Insights can be applicable to other biases (e.g., Level-k).
Yuval Heller (Oxford)
Three Steps Ahead
23 / 23
Discussion
Conclusion
Question Why uncertain horizon that becomes certain when reaching foresight?
Interpretation: the “physical” interaction is finite; horizon is infinite when the last period is not part of the strategic considerations. “A key criterion that determines whether we should use a model with a finite or an infinite horizon is whether the last period enters explicitly into the players’ strategic considerations.” (Osborne & Rubinstein, 94)
Similar results can be obtained in a model with a fixed length. Yuval Heller (Oxford)
Three Steps Ahead
24 / 23
Discussion
Conclusion
Question Why do we interpret Lk as limited foresight? Alternative “myopic” notion: L0k evaluates longer games as having horizon k.
How bounded agents play long zero-sum games (e.g., chess)? I
Bounded Minimax algorithm: look several steps ahead, and use a heuristic function to evaluate non-final positions.
In non-zero-sum repeated games: position = history. I
The myopic notion uses a constant evaluation.
I
Evaluations should be history-dependent (& simple).
I
In my model: the evaluation relies on the infinite-horizon benchmark.
Yuval Heller (Oxford)
Three Steps Ahead
25 / 23
Discussion
Conclusion
Static Stability Analysis Payoff monotonic population dynamics
Static Auxiliary Game
Random matching in a single population
Symmetric 2-player game
Feasible types
Feasible Actions
State of the population
Mixed strategy
Necessary condition for stability
Symmetric Nash equilibrium
Stable state in a broad set of dynamics
Evolutionary stable strategy
Stability is robust to the presence of sophisticated agents. Nash (1950 thesis), Maynard-Smith & Price (1973), Taylor and Jonker (1978), Thomas (1985), Bomze (1986), Cressman (1990, 1997), Sandholm (2010) ... Yuval Heller (Oxford)
Three Steps Ahead
26 / 23
Discussion
Conclusion
Evolutionarily Stable Strategy Nash equilibrium: (1) allow arbitrary play off-equilibrium path; and (2) may be dynamically unstable.
Definition (Maynard-Smith & Price, 1973) A symmetric Nash equilibrium σ is an evolutionarily stable strategy (ESS) if for each other best-reply σ 0 : u (σ , σ 0 ) > u (σ 0 , σ 0 ).
Interpretation: If σ is adopted by the population, it cannot be invaded by any alternative strategy that is initially rare. Yuval Heller (Oxford)
Three Steps Ahead
27 / 23
Discussion
Conclusion
Limit ESS ESSs almost never exist in repeated games due to equivalent strategies, which only differ off the equilibrium path. There is no ESS in repeated Prisoner’s Dilemma (Lorberbaum, 1994).
Definition (Selten, 1983) σ is a limit ESS if it is a limit of ESSs of perturbed games when the perturbations converge to 0. Perturbed game: game with minimal probabilities to choose each action a at each information set h. Yuval Heller (Oxford)
Three Steps Ahead
28 / 23
Discussion
Conclusion
(Informal) Motivation for Early Niceness: 1
10 rounds of PD in the lab: most subjects defect only at the last few rounds (Selten & Stoecker, 86; Anderoni & Miller, 93; Cooper et al., 96).
2
Tournaments of PD among algorithms (Axelrod, 84; Wu & Axelrod, 95).
3
Robson (90) - “secret-handshake mutants” can take a population from an inefficient ESS to an efficient ESS.
4
Early cooperation is the unique outcome in related setups: I
Finite long repeated PD with a few “crazy” players (Kreps et al., 82).
I
δ = 1 and small noise / complexity costs (Fudenberg & Maskin, 90; Binmore & Samuelson, 92).
Yuval Heller (Oxford)
Three Steps Ahead
Back 29 / 23
Discussion
Conclusion
Sketch of Proof – 1/2 (Theorem 1): L1 & L3 play the following reduced game (given b∗ ): (only payoffs at horizons 2-3 are presented; other payoffs are the same.)
L1
L3 2·A
L1
2·A
L3
2·A+1
f (p) < A ⇔ p >
1 A−1
A
A
2·A+1
f (p)
f (p) & p
f (p)
⇔ Hawk-Dove game.
µ ∗ (L3 ) - unique frequency balances the payoffs of L1 & L3 . Yuval Heller (Oxford)
Three Steps Ahead
Back
30 / 23
Discussion
Conclusion
Sketch of Proof – 2/2 (Theorem 1): b∗ is a best-reply for all abilities: I
Uncertain horizon: Lorberbaum et al. (2002).
I
Against strangers (µ ∗ (L1 ) > 1/A ⇔ p > F
I
A ): (A−1)2
Defection at horizon 3 is better against L3 and worse against L1 .
Against observed L3 (p <
A−1 A ):
Defection at horizon 4 is better
(worse) against an observing (unobserving) opponent.
Other abilities cannot yield higher payoffs: I
L2 is strictly dominated by L3 (worse payoff against L3 , same against L1 ).
I
L>3 - can’t improve L3 ’s optimal play.
Yuval Heller (Oxford)
Three Steps Ahead
Back 31 / 23
Discussion
Conclusion
Sketch of Proof 1/2 (Theorem 2)
p > ... ⇒The smallest incumbent ability must be L1 . Assume that all incumbents cooperate at horizons >2 against cooperative strangers. I
p < ... ⇒ all incumbents must cooperate at horizons > 3 against all cooperative opponents.
I I
Incumbents must only include L1 and L3 ⇒ equivalent to σ ∗ . Back
Yuval Heller (Oxford)
Three Steps Ahead
32 / 23
Discussion
Conclusion
Sketch of Proof 2/2 (Theorem 2) Assume the opposite: some incumbents defect at horizons >2 against cooperative strangers. ⇒ µ (L1 ) <
2 A+1
⇒ µ (L>2 ) <
1 A·p
⇒p > ...⇒ µ (L2 ) > 0 (otherwise, L1 outperforms) (otherwise, L2 are outperformed by L1 ).
⇒ Everyone cooperates at horizons > 3 against cooperative strangers. ⇒ Everyone cooperates at horizons > 4 . Balanced payoffs imply a unique frequency of abilities: L1 ,L2 & L≥4 . Unstable to small “group” perturbations. Yuval Heller (Oxford)
Three Steps Ahead
Back 33 / 23
Discussion
Conclusion
Uniqueness Results for Weaker Solution Concepts
1
Symmetric perfect equilibrium → heterogeneous population of L1 and a subset of {L2 , L3 , L4 }.
2
Neutrally stable strategy → a (possibly) shifted σ ∗ of Lk and Lk+2 .
3
Perfect + Normally stable → σ ∗ is unique. Back
Yuval Heller (Oxford)
Three Steps Ahead
23 / 23