Three Steps Ahead

Viewer
Transcript

Three Steps Ahead

Yuval Heller Nuffield College & Department of Economics, Oxford

January 2014

Yuval Heller (Oxford)

Three Steps Ahead

1 / 23

Introduction

Background Evidence suggests that people have systematic deviations from payoff maximizing behavior; these biases have economic implications. I

In some cases, the biases cannot be attributed to complexity costs.

I

Substantial heterogeneity of the biases in the population.

Research approach: I

People base their choices on heuristics (rules of thumb).

I

Different heuristics “compete” in a process of cultural learning.

I

Understanding how biases survive these competitive forces can help to achieve better understanding of the biases and their implications.

Yuval Heller (Oxford)

Three Steps Ahead

2 / 23

Introduction

Limited Foresight Stylized facts: I

People look only few steps ahead in long games.

I

Some subjects systematically look fewer steps than others.

Examples: I

Finitely repeated Prisoner’s Dilemma (Selten & Stoecker, 1986).

I

Sequential bargaining (Johnson et al., 2002).

Question 1

How do the naive agents survive?

2

Why is not there an “arms race” for better foresight? Yuval Heller (Oxford)

Three Steps Ahead

3 / 23

Introduction

Research Objectives 1

Characterizing a stable state in which all agents have short foresight abilities, and some agents look further than others.

2

Under which additional assumptions this stable state is unique?

3

Novel explanation for cooperative behavior in long finite games. I

All types have the same maximal payoff (unlike Kreps et al., 1982).

Yuval Heller (Oxford)

Three Steps Ahead

4 / 23

Model

Outline

1

Model

2

Characterization of a Stable State

3

Uniqueness

4

Discussion

Yuval Heller (Oxford)

Three Steps Ahead

5 / 23

Model

Evolutionary Dynamics Large population of agents interact in repeated Prisoner’s Dilemma. Agent’s type determines: (1) foresight ability, (2) behavior. Foresight abilities are partially observable. More successful types become more frequent (payoff-monotonic dynamics): cultural learning or biological evolution. I

E.g., replicator dynamics - # of offspring is proportional to payoffs.

Rarely a few agents experiment with a new type (“mutants”). Objective: characterize stable states of the population in the long run. Yuval Heller (Oxford)

Three Steps Ahead

6 / 23

Model

Types and Populations

The type of each agent has two components: I

Foresight Ability: {L1 , L2 , ..., Lk , ...} - how early the agent becomes aware of the final period of the game and its strategic implications.

I

Behavior in the repeated Prisoner’s Dilemma F

In which situations the agent cooperates, and in which he defects.

State of the population: A distribution over the set of types. I

Incumbents - types with positive frequency.

Yuval Heller (Oxford)

Three Steps Ahead

7 / 23

Model

Stable States Definition (a la Maynard-Smith & Price, 1973) A state is stable if it satisfies 3 properties: 1

Balance: All incumbents get the same expected payoff.

2

Internal stability: If one fraction of the population becomes more frequent, its payoff decreases.

3

External stability: if a “mutant” type enters the population, it is outperformed.

Yuval Heller (Oxford)

Three Steps Ahead

8 / 23

Model

Game Agents are randomly matched and play repeated Prisoner’s Dilemma. The game has a random length - T (Geometric distribution) I

At each round there is a continuation probability δ close to 1.

The payoffs at each stage are (A > 1):

C

Yuval Heller (Oxford)

C

A

D

A+1

D A 0

0 1

Three Steps Ahead

A+1 1

9 / 23

Model

Information Structure Agent with ability Lk is informed about the realized length k rounds before the end. I

Horizon - the number of remaining stages.

Partial observability of abilities (a la Dekel, Ely, Yilankaya, 2007): I

Each player observes with probability p the opponent’s ability.

I

With probability 1 − p: non-informative signal - opponent is a stranger.

All signals are private.

Yuval Heller (Oxford)

Three Steps Ahead

10 / 23

Characterization of a Stable State

Outline

1

Model

2

Characterization of a Stable State

3

Uniqueness

4

Discussion

Yuval Heller (Oxford)

Three Steps Ahead

11 / 23

Characterization of a Stable State

Definition (Population state σ ∗ ) Two incumbent abilities: L1 and L3 ; Frequency of L3 is

1 p ·(A−1) .

Deterministic simple behavior: I

Everyone plays perfect-tit-for-tat at unknown horizons. F

Defect iff players played different actions in the previous round.

I

L1 agents: defect at the last round.

I

L3 agents: defect at the last two rounds. F

Horizon 3: perfect-tit-for-tat against strangers & L1 ; defect otherwise.

Theorem A A−1 σ ∗ is a stable population state ∀ (A−1) (A = 10: 10% < p < 90% ). 2

Yuval Heller (Oxford)

Three Steps Ahead

12 / 23

Characterization of a Stable State

Intuition Why σ ∗ is Stable Internal stability holds if p is not too low: I

L1 fares better against L3 opponent (“commitment” induces cooperation at horizon 3).

I

L3 fares better against L1 opponent (defects at horizon 2).

I

Unique frequency balances the payoffs and induces internal stability (“Hawk-dove”-like game).

Details

External stability holds if p is not too high: I

The optimal play of L>3 agents is to mimic L3 ’s play.

I

L2 is strictly dominated by L3 (worse payoff against L3 , same against L1 ).

Yuval Heller (Oxford)

Three Steps Ahead

Details

13 / 23

Uniqueness

Outline

1

Model

2

Characterization of a Stable State

3

Uniqueness

4

Discussion

Yuval Heller (Oxford)

Three Steps Ahead

14 / 23

Uniqueness

Many States are Stable (“Folk-Theorem”) Proposition (all types and all rates of cooperation are stable) For any p, k, r , if δ is close to 1, then there exists a stable state in which all incumbents have ability Lk and they cooperate with frequency close to r .

Stability relies on “discriminating” against the mutants. I

Lk vs. Lk : uninformed: cooperate with frequency r ; informed: defect.

I

Lk vs. Lk 0 : a different cycle that yields more to Lk and less to Lk 0 .

Is σ ∗ unique in a plausible subset of stable strategies?

Yuval Heller (Oxford)

Three Steps Ahead

15 / 23

Uniqueness

Early-Niceness

Definition A state is early-nice if each incumbent cooperates when: (1) the horizon is unknown or sufficiently large; and (2) no one ever defected before.

Remark Focus on “nice” incumbents; no restrictions on mutants. Equivalent definition: efficiency + non-discrimination against mutants: I

Efficient play at large horizons; also if one of the players “trembles” and chooses a different ability.

I

Further motivation

Motivation for efficiency - “secret handshake” (Robson, 1990).

Fits experimentally observed behavior (e.g., Selten & Stoecker, 86). Yuval Heller (Oxford)

Three Steps Ahead

16 / 23

Uniqueness

Result

Theorem (Uniqueness of σ ∗) Let A > 3. Any early-nice stable state is realization equivalent to σ ∗ (=unique frequency of types & observable behavior; only behavior following zero-probability events may differ.)

Sketch of proof: 1/2

2/2

Intuition. Let Lk be the lowest incumbent ability. Everyone must defect during the last k rounds. If only Lk : “mutants” with ability Lk+1 invade. If Lk & Lk+1 : Lk is outperformed. If Lk & L≥k+3 : unstable to invasions of abilities in between. If k > 1 : “mutants” with ability L1 invade. Yuval Heller (Oxford)

Three Steps Ahead

17 / 23

Uniqueness

Yuval Heller (Oxford)

Graphical Representation of Results

Three Steps Ahead

18 / 23

Discussion

Outline

1

Model

2

Characterization of a Stable State

3

Uniqueness

4

Discussion

Yuval Heller (Oxford)

Three Steps Ahead

19 / 23

Discussion

Extensions 1

Having a far-sighted L∞ ability.

2

Allowing players to send false signals:

3

I

“Cheap-talk” - always defecting is the unique outcome.

I

Results can be extended to a setup with costly lies: F

Each player chooses deception effort and fake ability.

F

Efforts determine the probability of observing opponent’s true ability.

A setup with several games: if games in which looking far ahead decreases efficiency (like Centipede, social dilemma games) are sufficiently frequent. Yuval Heller (Oxford)

Three Steps Ahead

20 / 23

Discussion

Related Literature (1) Kreps et al. (1982): Few “crazy” tit-for-tat players can induce cooperation in the finitely repeated Prisoner’s dilemma. I

My model: Everyone obtains the same maximal payoff.

Evolution of preferences (Guth & Yaari, 1992; Dekel et al. 2007): high level of observability can lead to a homogenous population of cooperative players. I

My model: moderate partial observability induces a heterogeneous population of cooperators and shirkers.

Yuval Heller (Oxford)

Three Steps Ahead

21 / 23

Discussion

Related Literature (2)

Level-k evolution in one-shot games (Stahl, 93; Stennek, 00; Mohlin, 12). I

My model: 0 < p < 1 leads to a qualitatively different result.

Uncertainty about the final period (Samuelson, 1987; Neyman, 1999) and limited foresight (Jehiel, 2001; Mengel, 2012) can induce cooperation in repeated Prisoner’s Dilemma. I

My model: Limited foresight is part of the result (not an assumption).

Yuval Heller (Oxford)

Three Steps Ahead

22 / 23

Discussion

Conclusion

Summary Moderate partial observability induces a stable heterogeneous population of agents who look a few steps ahead and cooperate until the last few rounds. Efficient play at early stages implies uniqueness. Insights can be applicable to other biases (e.g., Level-k).

Yuval Heller (Oxford)

Three Steps Ahead

23 / 23

Discussion

Conclusion

Question Why uncertain horizon that becomes certain when reaching foresight?

Interpretation: the “physical” interaction is finite; horizon is infinite when the last period is not part of the strategic considerations. “A key criterion that determines whether we should use a model with a finite or an infinite horizon is whether the last period enters explicitly into the players’ strategic considerations.” (Osborne & Rubinstein, 94)

Similar results can be obtained in a model with a fixed length. Yuval Heller (Oxford)

Three Steps Ahead

24 / 23

Discussion

Conclusion

Question Why do we interpret Lk as limited foresight? Alternative “myopic” notion: L0k evaluates longer games as having horizon k.

How bounded agents play long zero-sum games (e.g., chess)? I

Bounded Minimax algorithm: look several steps ahead, and use a heuristic function to evaluate non-final positions.

In non-zero-sum repeated games: position = history. I

The myopic notion uses a constant evaluation.

I

Evaluations should be history-dependent (& simple).

I

In my model: the evaluation relies on the infinite-horizon benchmark.

Yuval Heller (Oxford)

Three Steps Ahead

25 / 23

Discussion

Conclusion

Static Stability Analysis Payoff monotonic population dynamics

Static Auxiliary Game

Random matching in a single population

Symmetric 2-player game

Feasible types

Feasible Actions

State of the population

Mixed strategy

Necessary condition for stability

Symmetric Nash equilibrium

Stable state in a broad set of dynamics

Evolutionary stable strategy

Stability is robust to the presence of sophisticated agents. Nash (1950 thesis), Maynard-Smith & Price (1973), Taylor and Jonker (1978), Thomas (1985), Bomze (1986), Cressman (1990, 1997), Sandholm (2010) ... Yuval Heller (Oxford)

Three Steps Ahead

26 / 23

Discussion

Conclusion

Evolutionarily Stable Strategy Nash equilibrium: (1) allow arbitrary play off-equilibrium path; and (2) may be dynamically unstable.

Definition (Maynard-Smith & Price, 1973) A symmetric Nash equilibrium σ is an evolutionarily stable strategy (ESS) if for each other best-reply σ 0 : u (σ , σ 0 ) > u (σ 0 , σ 0 ).

Interpretation: If σ is adopted by the population, it cannot be invaded by any alternative strategy that is initially rare. Yuval Heller (Oxford)

Three Steps Ahead

27 / 23

Discussion

Conclusion

Limit ESS ESSs almost never exist in repeated games due to equivalent strategies, which only differ off the equilibrium path. There is no ESS in repeated Prisoner’s Dilemma (Lorberbaum, 1994).

Definition (Selten, 1983) σ is a limit ESS if it is a limit of ESSs of perturbed games when the perturbations converge to 0. Perturbed game: game with minimal probabilities to choose each action a at each information set h. Yuval Heller (Oxford)

Three Steps Ahead

28 / 23

Discussion

Conclusion

(Informal) Motivation for Early Niceness: 1

10 rounds of PD in the lab: most subjects defect only at the last few rounds (Selten & Stoecker, 86; Anderoni & Miller, 93; Cooper et al., 96).

2

Tournaments of PD among algorithms (Axelrod, 84; Wu & Axelrod, 95).

3

Robson (90) - “secret-handshake mutants” can take a population from an inefficient ESS to an efficient ESS.

4

Early cooperation is the unique outcome in related setups: I

Finite long repeated PD with a few “crazy” players (Kreps et al., 82).

I

δ = 1 and small noise / complexity costs (Fudenberg & Maskin, 90; Binmore & Samuelson, 92).

Yuval Heller (Oxford)

Three Steps Ahead

Back 29 / 23

Discussion

Conclusion

Sketch of Proof – 1/2 (Theorem 1): L1 & L3 play the following reduced game (given b∗ ): (only payoffs at horizons 2-3 are presented; other payoffs are the same.)

L1

L3 2·A

L1

2·A

L3

2·A+1

f (p) < A ⇔ p >

1 A−1

A

A

2·A+1

f (p)

f (p) & p

f (p)

⇔ Hawk-Dove game.

µ ∗ (L3 ) - unique frequency balances the payoffs of L1 & L3 . Yuval Heller (Oxford)

Three Steps Ahead

Back

30 / 23

Discussion

Conclusion

Sketch of Proof – 2/2 (Theorem 1): b∗ is a best-reply for all abilities: I

Uncertain horizon: Lorberbaum et al. (2002).

I

Against strangers (µ ∗ (L1 ) > 1/A ⇔ p > F

I

A ): (A−1)2

Defection at horizon 3 is better against L3 and worse against L1 .

Against observed L3 (p <

A−1 A ):

Defection at horizon 4 is better

(worse) against an observing (unobserving) opponent.

Other abilities cannot yield higher payoffs: I

L2 is strictly dominated by L3 (worse payoff against L3 , same against L1 ).

I

L>3 - can’t improve L3 ’s optimal play.

Yuval Heller (Oxford)

Three Steps Ahead

Back 31 / 23

Discussion

Conclusion

Sketch of Proof 1/2 (Theorem 2)

p > ... ⇒The smallest incumbent ability must be L1 . Assume that all incumbents cooperate at horizons >2 against cooperative strangers. I

p < ... ⇒ all incumbents must cooperate at horizons > 3 against all cooperative opponents.

I I

Incumbents must only include L1 and L3 ⇒ equivalent to σ ∗ . Back

Yuval Heller (Oxford)

Three Steps Ahead

32 / 23

Discussion

Conclusion

Sketch of Proof 2/2 (Theorem 2) Assume the opposite: some incumbents defect at horizons >2 against cooperative strangers. ⇒ µ (L1 ) <

2 A+1

⇒ µ (L>2 ) <

1 A·p

⇒p > ...⇒ µ (L2 ) > 0 (otherwise, L1 outperforms) (otherwise, L2 are outperformed by L1 ).

⇒ Everyone cooperates at horizons > 3 against cooperative strangers. ⇒ Everyone cooperates at horizons > 4 . Balanced payoffs imply a unique frequency of abilities: L1 ,L2 & L≥4 . Unstable to small “group” perturbations. Yuval Heller (Oxford)

Three Steps Ahead

Back 33 / 23

Discussion

Conclusion

Uniqueness Results for Weaker Solution Concepts

1

Symmetric perfect equilibrium → heterogeneous population of L1 and a subset of {L2 , L3 , L4 }.

2

Neutrally stable strategy → a (possibly) shifted σ ∗ of Lk and Lk+2 .

3

Perfect + Normally stable → σ ∗ is unique. Back

Yuval Heller (Oxford)

Three Steps Ahead

23 / 23

Three Steps Ahead. Yuval Heller. Nuffield College & Department of Economics, Oxford. January 2014. Yuval Heller (Oxford). Three Steps Ahead. 1 / 23 ...

Download PDF

977KB Sizes 0 Downloads 230 Views

Report

Three Steps Ahead

Recommend Documents