Calibration and Internal No-Regret with Random Signals Vianney Perchet Equipe Combinatoire et Optimisation Université Pierre et Marie Curie

4 october 2009 20th International conference on Algorithmic Learning Theory

1

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Outline

2

1

Introduction

2

Full Monitoring From Approachability to Internal Consistency From Internal Consistency to Calibration

3

Partial Monitoring

4

Conclusion- on going work

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Introduction

Outline

3

1

Introduction

2

Full Monitoring From Approachability to Internal Consistency From Internal Consistency to Calibration

3

Partial Monitoring

4

Conclusion- on going work

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Introduction

Introduction Approachability : Game with vector payoff. Average payoff converges to a given set. ⇓ No-regret : Game with real payoffs. The strategy is as good as any constant one. ⇓



Calibration : Sequence of outcomes (eg {0, 1}N ). Property of a sequence of predictions.

Full Monitoring : [Blackwell, Hart] Construct strategies with no-regret, using approachability of a convex set. Partial Monitoring : [P.] Construct strategies with no-regret, using calibration (in full monitoring). 4

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Introduction

Introduction Approachability : Game with vector payoff. Average payoff converges to a given set. ⇓ No-regret : Game with real payoffs. The strategy is as good as any constant one. ⇓



Calibration : Sequence of outcomes (eg {0, 1}N ). Property of a sequence of predictions.

Full Monitoring : [Blackwell, Hart] Construct strategies with no-regret, using approachability of a convex set. Partial Monitoring : [P.] Construct strategies with no-regret, using calibration (in full monitoring). 4

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Introduction

Introduction Approachability : Game with vector payoff. Average payoff converges to a given set. ⇓ No-regret : Game with real payoffs. The strategy is as good as any constant one. ⇓



Calibration : Sequence of outcomes (eg {0, 1}N ). Property of a sequence of predictions.

Full Monitoring : [Blackwell, Hart] Construct strategies with no-regret, using approachability of a convex set. Partial Monitoring : [P.] Construct strategies with no-regret, using calibration (in full monitoring). 4

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Introduction

Model 2 persons repeated game, with finite actions spaces I (resp. J) of Player 1 (resp. Nature). Finite signals space : S. Payoff function ρ : I × J → [−1, 1]k Signal function s : I × J → ∆(S) Stage n : P1 and Nature choose in and jn . P1 gets ρn = ρ(in , jn ) and observes sn , whose law is s(in , jn ). Strategies P1 : σ = (σn )n∈N , σn : (I × S)n → ∆(I) Nature : τ = (τn )n∈N , τn : (I × J × S)n → ∆(J) Pσ ,τ probability generated by (σ , τ) on (I × J × S)N .

5

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Introduction

Model 2 persons repeated game, with finite actions spaces I (resp. J) of Player 1 (resp. Nature). Finite signals space : S.  Payoff function ρ : ∆(I) × ∆(J) → [−1, 1]k extended multilineary Signal function s : ∆(I) × ∆(J) → ∆(S) Stage n : P1 and Nature choose in and jn . P1 gets ρn = ρ(in , jn ) and observes sn , whose law is s(in , jn ). Strategies P1 : σ = (σn )n∈N , σn : (I × S)n → ∆(I) Nature : τ = (τn )n∈N , τn : (I × J × S)n → ∆(J) Pσ ,τ probability generated by (σ , τ) on (I × J × S)N .

5

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Introduction

Model 2 persons repeated game, with finite actions spaces I (resp. J) of Player 1 (resp. Nature). Finite signals space : S.  Payoff function ρ : ∆(I) × ∆(J) → [−1, 1]k extended multilineary Signal function s : ∆(I) × ∆(J) → ∆(S) Stage n : P1 and Nature choose in and jn . P1 gets ρn = ρ(in , jn ) and observes sn , whose law is s(in , jn ). Strategies P1 : σ = (σn )n∈N , σn : (I × S)n → ∆(I) Nature : τ = (τn )n∈N , τn : (I × J × S)n → ∆(J) Pσ ,τ probability generated by (σ , τ) on (I × J × S)N .

5

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Full Monitoring

Outline

6

1

Introduction

2

Full Monitoring From Approachability to Internal Consistency From Internal Consistency to Calibration

3

Partial Monitoring

4

Conclusion- on going work

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Full Monitoring

Approachability

Average payoff at stage n: ρ n = ∑nm=1 ρm /n Approachability : Definition A closed convex set C ⊂ Rk is approachable by P1 if for every ε > 0, there exists a strategy σ of P1 and N ∈ N such that, for every strategy τ of Nature, and all n ≥ N: Eσ ,τ [dC (ρ n )] ≤ ε

and Pσ ,τ (∃n ≥ N, dC (ρ n ) > ε) < ε,

where dC (x) = infc∈C kx − ck.

7

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Full Monitoring

Internal No-regret S = J, P1 observes the actions played by Nature, payoffs are reals (ρ : I × J → R). Internal regret at stage n, I × I-matrix:  0 , ... , 0  ..  .   Rn =  ρ(1, jn ) − ρ(in , jn ) , . . . , ρ(I, jn ) − ρ(in , jn )  ..  . 0 , ... , 0

     ← in -th raw   

Internal consistency: Definition A strategy σ of P1 is internally consistent if for every strategy τ: lim sup Rn ∈ RI×I − , n→∞

8

Pσ ,τ -as,

either the frequency Vianney of i (|N goes to zero, or i is the best n (i)|/n) Perchet Calibration and Internal No-Regret with Random Signals

Full Monitoring

Internal No-regret S = J, P1 observes the actions played by Nature, payoffs are reals (ρ : I × J → R). Internal regret at stage n, I × I-matrix:  0 , ... , 0  ..  .   Rn =  ρ(1, jn ) − ρ(in , jn ) , . . . , ρ(I, jn ) − ρ(in , jn )  ..  . 0 , ... , 0

     ← in -th raw   

Internal consistency: Definition A strategy σ of P1 is internally consistent if for every strategy τ: lim sup Rn ∈ RI×I − , n→∞

8

Pσ ,τ -as,

either the frequency Vianney of i (|N goes to zero, or i is the best n (i)|/n) Perchet Calibration and Internal No-Regret with Random Signals

Full Monitoring

Internal No-regret S = J, P1 observes the actions played by Nature, payoffs are reals (ρ : I × J → R). Average internal regret at stage n, I × I-matrix: ik

Rn =

  |Nn (i)| ρ k, jn (i) − ρ i, jn (i) ≤ 0, n

∀k ∈ I

Internal consistency: Definition A strategy σ of P1 is internally consistent if for every strategy τ: lim sup Rn ∈ RI×I − , n→∞

Pσ ,τ -as,

either the frequency of i (|Nn (i)|/n) goes to zero, or i is the best response to jn (i), the empirical distribution of Nature’s actions when P1 played i. 8

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Full Monitoring

From Approachability to Internal Consistency

From Approachability to Internal Consistency Auxiliary game RI, actions spaces I and J. Vector payoff at stage n : Rn . Approachability implies consistency [Hart-Mas Colell] If P1 has a strategy σ that approaches RI×I − in RI then σ is internally consistent. Construction of strategy: at stage + n + 1, P1 plays xn+1 ∈ ∆(I), an invariant measure of Rn so that : D  + E Exn+1 [Rn+1 ] − ΠI×I Rn , Rn = 0, with ΠI×I the projection on RI×I − . Therefore [Blackwell 56] RI×I − is approachable with this strategy. 9

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Full Monitoring

From Approachability to Internal Consistency

From Approachability to Internal Consistency Auxiliary game RI, actions spaces I and J. Vector payoff at stage n : Rn . Approachability implies consistency [Hart-Mas Colell] If P1 has a strategy σ that approaches RI×I − in RI then σ is internally consistent. Construction of strategy: at stage + n + 1, P1 plays xn+1 ∈ ∆(I), an invariant measure of Rn so that : D  + E Exn+1 [Rn+1 ] − ΠI×I Rn , Rn = 0, with ΠI×I the projection on RI×I − . Therefore [Blackwell 56] RI×I − is approachable with this strategy. 9

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Full Monitoring

From Approachability to Internal Consistency

From Approachability to Internal Consistency Auxiliary game RI, actions spaces I and J. Vector payoff at stage n : Rn . Approachability implies consistency [Hart-Mas Colell] If P1 has a strategy σ that approaches RI×I − in RI then σ is internally consistent. Construction of strategy: at stage + n + 1, P1 plays xn+1 ∈ ∆(I), an invariant measure of Rn so that : D  + E Exn+1 [Rn+1 ] − ΠI×I Rn , Rn = 0, with ΠI×I the projection on RI×I − . Therefore [Blackwell 56] RI×I − is approachable with this strategy. 9

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Full Monitoring

From Approachability to Internal Consistency

From Approachability to Internal Consistency Auxiliary game RI, actions spaces I and J. Vector payoff at stage n : Rn . Approachability implies consistency [Hart-Mas Colell] If P1 has a strategy σ that approaches RI×I − in RI then σ is internally consistent. Construction of strategy: at stage + n + 1, P1 plays xn+1 ∈ ∆(I), an invariant measure of Rn so that : D  + E Exn+1 [Rn+1 ] − ΠI×I Rn , Rn = 0, with ΠI×I the projection on RI×I − . Therefore [Blackwell 56] RI×I − is approachable with this strategy. 9

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Full Monitoring

From Approachability to Internal Consistency

From Approachability to Internal Consistency Auxiliary game RI, actions spaces I and J. Vector payoff at stage n : Rn . Approachability implies consistency [Hart-Mas Colell] If P1 has a strategy σ that approaches RI×I − in RI then σ is internally consistent. Construction of strategy: at stage + n + 1, P1 plays xn+1 ∈ ∆(I), an invariant measure of Rn so that : D  + E Exn+1 [Rn+1 ] − ΠI×I Rn , Rn = 0, with ΠI×I the projection on RI×I − . Therefore [Blackwell 56] RI×I − is approachable with this strategy. 9

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Full Monitoring

From Internal Consistency to Calibration

From Internal Consistency to Calibration At stage n, Nature chooses an outcome ωn ∈ Ω. Player 1 predicts it by announcing pn ∈ {ω(1), . . . , ω(L)} ⊂ ∆ (Ω). Calibration: Definition [Dawid] A strategy σ : n∈N (L × Ω)n → ∆(L) of P1 is (L)-calibrated if for every strategy τ of Nature, for every l, k ∈ L: S

lim sup n→∞

 |Nn (l)|  kω n (l) − ω(l)k2 − kω n (l) − ω(k)k2 ≤ 0, n

Pσ ,τ -as,

where Nn (l) = {m ≤ n, lm = l} and ω n (l) = ∑m∈Nn (l) ωm /n. Internal Consistency implies Calibration [Foster-Vohra] [Sorin] If P1 has a strategy σ that is internally consistent in the following game, then it is calibrated. Payoff function ρ : L × Ω → R, ρ(l, ω) = − kω − ω(l)k22 . 10

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Full Monitoring

From Internal Consistency to Calibration

From Internal Consistency to Calibration At stage n, Nature chooses an outcome ωn ∈ Ω. Player 1 predicts it by announcing pn ∈ {ω(1), . . . , ω(L)} ⊂ ∆ (Ω). Calibration: Definition [Dawid] A strategy σ : n∈N (L × Ω)n → ∆(L) of P1 is (L)-calibrated if for every strategy τ of Nature, for every l, k ∈ L: S

lim sup n→∞

 |Nn (l)|  kω n (l) − ω(l)k2 − kω n (l) − ω(k)k2 ≤ 0, n

Pσ ,τ -as,

where Nn (l) = {m ≤ n, lm = l} and ω n (l) = ∑m∈Nn (l) ωm /n. Internal Consistency implies Calibration [Foster-Vohra] [Sorin] If P1 has a strategy σ that is internally consistent in the following game, then it is calibrated. Payoff function ρ : L × Ω → R, ρ(l, ω) = − kω − ω(l)k22 . 10

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Outline

11

1

Introduction

2

Full Monitoring From Approachability to Internal Consistency From Internal Consistency to Calibration

3

Partial Monitoring

4

Conclusion- on going work

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

No-Regret with Random Signals Player 1 observes a signal sn , whose law is s(in , jn ), payoffs are real and not observed. s(y) := (s(1, y), . . . , s(I, y)) ∈ M ⊂ ∆(S)I is called a flag (maximal information about y). Worst Case Evaluation Function W(x, µ) =

inf

y∈s−1 (µ)

ρ(x, y), if µ ∈ M

otherwise W(x, µ) = W(x, ΠM (µ)) with ΠM the projection on M. We assume that P1’s strategy is represented by a finite number of mixed actions {x(l) ∈ ∆(I), l ∈ L}: At stage n, he chooses (at random) ln , and given that choice in is drawn accordingly to x(ln ). 12

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

No-Regret with Random Signals Player 1 observes a signal sn , whose law is s(in , jn ), payoffs are real and not observed. s(y) := (s(1, y), . . . , s(I, y)) ∈ M ⊂ ∆(S)I is called a flag (maximal information about y). Worst Case Evaluation Function W(x, µ) =

inf

y∈s−1 (µ)

ρ(x, y), if µ ∈ M

otherwise W(x, µ) = W(x, ΠM (µ)) with ΠM the projection on M. We assume that P1’s strategy is represented by a finite number of mixed actions {x(l) ∈ ∆(I), l ∈ L}: At stage n, he chooses (at random) ln , and given that choice in is drawn accordingly to x(ln ). 12

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Internal consistency (L, ε)-consistency [Lehrer-Solan] A strategy σ of P1 is (L, ε)-consistent if for every strategy τ of Nature: Pσ ,τ -as, for every l ∈ L,    |Nn (l)| lim sup max W (z, µ n (l)) − ρ n (l) − ε ≤ 0 n z∈∆(I) n→∞ with µ n (l) = ∑m∈Nn (l) s(jm )/|Nn (l)| the unobserved average flag on Nn (l) = {m ≤ n, xm = x(l)}. Construction of (L, ε)-consistent strategies [P.] For every ε > 0, there exist (L, ε)-internally consistent strategies.

13

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Internal consistency (L, ε)-consistency [Lehrer-Solan] A strategy σ of P1 is (L, ε)-consistent if for every strategy τ of Nature: Pσ ,τ -as, for every l ∈ L,    |Nn (l)| lim sup max W (z, µ n (l)) − W (x(l), µ n (l)) − ε ≤ 0 n z∈∆(I) n→∞ with µ n (l) = ∑m∈Nn (l) s(jm )/|Nn (l)| the unobserved average flag on Nn (l) = {m ≤ n, xm = x(l)}. Construction of (L, ε)-consistent strategies [P.] For every ε > 0, there exist (L, ε)-internally consistent strategies.

13

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Internal consistency (L, ε)-consistency [Lehrer-Solan] A strategy σ of P1 is (L, ε)-consistent if for every strategy τ of Nature: Pσ ,τ -as, for every l ∈ L,    |Nn (l)| lim sup max W (z, µ n (l)) − W (x(l), µ n (l)) − ε ≤ 0 n z∈∆(I) n→∞ with µ n (l) = ∑m∈Nn (l) s(jm )/|Nn (l)| the unobserved average flag on Nn (l) = {m ≤ n, xm = x(l)}. Construction of (L, ε)-consistent strategies [P.] For every ε > 0, there exist (L, ε)-internally consistent strategies.

13

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Previous Approaches External Regret: Rustichini - existence of externally consistent strategies Lugosi, Mannor, Stoltz - Construction of such a strategy (by blocks of size m). Compute the empirical flag on a block (of length m), then compute the worst possible payoff compatible with this flag for any pure action of P1. Decide what to play on the next block accordingly to a (similar)  weighted exponential algorithm. Regret bounded in O n−1/5 . Internal Regret: Lehrer-Solan - existence of internally consistent strategies (with stronger assumption on s).

14

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Previous Approaches External Regret: Rustichini - existence of externally consistent strategies Lugosi, Mannor, Stoltz - Construction of such a strategy (by blocks of size m). Compute the empirical flag on a block (of length m), then compute the worst possible payoff compatible with this flag for any pure action of P1. Decide what to play on the next block accordingly to a (similar)  weighted exponential algorithm. Regret bounded in O n−1/5 . Internal Regret: Lehrer-Solan - existence of internally consistent strategies (with stronger assumption on s).

14

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Previous Approaches External Regret: Rustichini - existence of externally consistent strategies Lugosi, Mannor, Stoltz - Construction of such a strategy (by blocks of size m). Compute the empirical flag on a block (of length m), then compute the worst possible payoff compatible with this flag for any pure action of P1. Decide what to play on the next block accordingly to a (similar)  weighted exponential algorithm. Regret bounded in O n−1/5 . Internal Regret: Lehrer-Solan - existence of internally consistent strategies (with stronger assumption on s).

14

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: Main ideas δ -discretize ∆(S)I with {µ(l), l ∈ L}. For every l ∈ L, choose x(l) that maximizes W(z, µ(l)). Construct a calibrated strategy (outcome is sn , set of predictions {µ(l), l ∈ L}). Whenever µ(l) is predicted, play accordingly to x(l). Calibration: µ n (l) closer to µ(l) than to any µ(k): kµ n (l) − µ(l)k ≤ δ . Continuity of W: |W(z, µ n (l)) − W(z, µ(l))| ≤ ε. Choice of x(l): maximizes W(z, µ(l)) ⇒ 2ε-maximizes W(z, µ n (l)). 0.2cm The strategy is (L, 2ε)-internally consistent

15

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: Main ideas δ -discretize ∆(S)I with {µ(l), l ∈ L}. For every l ∈ L, choose x(l) that maximizes W(z, µ(l)). Construct a calibrated strategy (outcome is sn , set of predictions {µ(l), l ∈ L}). Whenever µ(l) is predicted, play accordingly to x(l). Calibration: µ n (l) closer to µ(l) than to any µ(k): kµ n (l) − µ(l)k ≤ δ . Continuity of W: |W(z, µ n (l)) − W(z, µ(l))| ≤ ε. Choice of x(l): maximizes W(z, µ(l)) ⇒ 2ε-maximizes W(z, µ n (l)). 0.2cm The strategy is (L, 2ε)-internally consistent

15

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: Main ideas δ -discretize ∆(S)I with {µ(l), l ∈ L}. For every l ∈ L, choose x(l) that maximizes W(z, µ(l)). Construct a calibrated strategy (outcome is sn , set of predictions {µ(l), l ∈ L}). Whenever µ(l) is predicted, play accordingly to x(l). Calibration: µ n (l) closer to µ(l) than to any µ(k): kµ n (l) − µ(l)k ≤ δ . Continuity of W: |W(z, µ n (l)) − W(z, µ(l))| ≤ ε. Choice of x(l): maximizes W(z, µ(l)) ⇒ 2ε-maximizes W(z, µ n (l)). 0.2cm The strategy is (L, 2ε)-internally consistent

15

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: Main ideas δ -discretize ∆(S)I with {µ(l), l ∈ L}. For every l ∈ L, choose x(l) that maximizes W(z, µ(l)). Construct a calibrated strategy (outcome is sn , set of predictions {µ(l), l ∈ L}). Whenever µ(l) is predicted, play accordingly to x(l). Calibration: µ n (l) closer to µ(l) than to any µ(k): kµ n (l) − µ(l)k ≤ δ . Continuity of W: |W(z, µ n (l)) − W(z, µ(l))| ≤ ε. Choice of x(l): maximizes W(z, µ(l)) ⇒ 2ε-maximizes W(z, µ n (l)). 0.2cm The strategy is (L, 2ε)-internally consistent

15

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: Main ideas δ -discretize ∆(S)I with {µ(l), l ∈ L}. For every l ∈ L, choose x(l) that maximizes W(z, µ(l)). Construct a calibrated strategy (outcome is sn , set of predictions {µ(l), l ∈ L}). Whenever µ(l) is predicted, play accordingly to x(l). Calibration: µ n (l) closer to µ(l) than to any µ(k): kµ n (l) − µ(l)k ≤ δ . Continuity of W: |W(z, µ n (l)) − W(z, µ(l))| ≤ ε. Choice of x(l): maximizes W(z, µ(l)) ⇒ 2ε-maximizes W(z, µ n (l)). 0.2cm The strategy is (L, 2ε)-internally consistent

15

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: Main ideas δ -discretize ∆(S)I with {µ(l), l ∈ L}. For every l ∈ L, choose x(l) that maximizes W(z, µ(l)). Construct a calibrated strategy (outcome is sn , set of predictions {µ(l), l ∈ L}). Whenever µ(l) is predicted, play accordingly to x(l). Calibration: µ n (l) closer to µ(l) than to any µ(k): kµ n (l) − µ(l)k ≤ δ . Continuity of W: |W(z, µ n (l)) − W(z, µ(l))| ≤ ε. Choice of x(l): maximizes W(z, µ(l)) ⇒ 2ε-maximizes W(z, µ n (l)). 0.2cm The strategy is (L, 2ε)-internally consistent

15

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: Main ideas δ -discretize ∆(S)I with {µ(l), l ∈ L}. For every l ∈ L, choose x(l) that maximizes W(z, µ(l)). Construct a calibrated strategy (outcome is sn , set of predictions {µ(l), l ∈ L}). Whenever µ(l) is predicted, play accordingly to x(l). Calibration: µ n (l) closer to µ(l) than to any µ(k): kµ n (l) − µ(l)k ≤ δ . Continuity of W: |W(z, µ n (l)) − W(z, µ(l))| ≤ ε. Choice of x(l): maximizes W(z, µ(l)) ⇒ 2ε-maximizes W(z, µ n (l)). 0.2cm The strategy is (L, 2ε)-internally consistent

15

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: Main ideas δ -discretize ∆(S)I with {µ(l), l ∈ L}. For every l ∈ L, choose x(l) that maximizes W(z, µ(l)). Construct a calibrated strategy (outcome is sn , set of predictions {µ(l), l ∈ L}). Whenever µ(l) is predicted, play accordingly to x(l). Calibration: µ n (l) closer to µ(l) than to any µ(k): kµ n (l) − µ(l)k ≤ δ . Continuity of W: |W(z, µ n (l)) − W(z, µ(l))| ≤ ε. Choice of x(l): maximizes W(z, µ(l)) ⇒ 2ε-maximizes W(z, µ n (l)). 0.2cm The strategy is (L, 2ε)-internally consistent

15

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: More details on calibration Outcome in calibration should not depend on Player 1’s action. Need to evaluate µn . Change x(l) into b x(l) = (1 − γ)x(l) + γu, with u the uniform distribution on I. Parameter γ > 0.  bsn =

1(s = sn )1(i = in ) b x(ln )[in ]



∈ Ω ⊂ RS

I

Eσ ,τ [bsn ] = µn

s,i∈S×I

Calibrated strategy with outcome bsn and set of prediction {µ(l), l ∈ L} (seen as element of the compact Ω).



x(l) − x(l)k ≤ 2γ:

bsn (l) − µ(l) ≤ δ ⇒ kµ n (l) − µ(l)k ≤ 2δ and kb Continuity of W: strategy is (L, ε)-internally consistent 16

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: More details on calibration Outcome in calibration should not depend on Player 1’s action. Need to evaluate µn . Change x(l) into b x(l) = (1 − γ)x(l) + γu, with u the uniform distribution on I. Parameter γ > 0.  bsn =

1(s = sn )1(i = in ) b x(ln )[in ]



∈ Ω ⊂ RS

I

Eσ ,τ [bsn ] = µn

s,i∈S×I

Calibrated strategy with outcome bsn and set of prediction {µ(l), l ∈ L} (seen as element of the compact Ω).



x(l) − x(l)k ≤ 2γ:

bsn (l) − µ(l) ≤ δ ⇒ kµ n (l) − µ(l)k ≤ 2δ and kb Continuity of W: strategy is (L, ε)-internally consistent 16

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Partial Monitoring

Proof: More details on calibration Outcome in calibration should not depend on Player 1’s action. Need to evaluate µn . Change x(l) into b x(l) = (1 − γ)x(l) + γu, with u the uniform distribution on I. Parameter γ > 0.  bsn =

1(s = sn )1(i = in ) b x(ln )[in ]



∈ Ω ⊂ RS

I

Eσ ,τ [bsn ] = µn

s,i∈S×I

Calibrated strategy with outcome bsn and set of prediction {µ(l), l ∈ L} (seen as element of the compact Ω).



x(l) − x(l)k ≤ 2γ:

bsn (l) − µ(l) ≤ δ ⇒ kµ n (l) − µ(l)k ≤ 2δ and kb Continuity of W: strategy is (L, ε)-internally consistent 16

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Conclusion- on going work

Outline

17

1

Introduction

2

Full Monitoring From Approachability to Internal Consistency From Internal Consistency to Calibration

3

Partial Monitoring

4

Conclusion- on going work

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Conclusion- on going work

Advantages of Calibrated strategy(1)

From Partial to Full Monitoring: Do not work on the space of payoff (which is in Partial Monitoring) but directly on the space of signals which is (almost) in Full Monitoring. One unique assumption: W is continuous. The result holds with any continuous function G on ∆(I) × ∆(S)I (eg the optimistic case G(x, µ) = supy∈s−1 (µ) ρ(x, y)). The size of J and its finiteness plays no role: Same result with J = [−1, 1]I and s : [−1, 1]I ⇒ ∆(S)I : If s is convex and its range is a polyhedron then W is continuous.

18

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Conclusion- on going work

Advantages of Calibrated strategy(1)

From Partial to Full Monitoring: Do not work on the space of payoff (which is in Partial Monitoring) but directly on the space of signals which is (almost) in Full Monitoring. One unique assumption: W is continuous. The result holds with any continuous function G on ∆(I) × ∆(S)I (eg the optimistic case G(x, µ) = supy∈s−1 (µ) ρ(x, y)). The size of J and its finiteness plays no role: Same result with J = [−1, 1]I and s : [−1, 1]I ⇒ ∆(S)I : If s is convex and its range is a polyhedron then W is continuous.

18

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Conclusion- on going work

Advantages of Calibrated strategy(1)

From Partial to Full Monitoring: Do not work on the space of payoff (which is in Partial Monitoring) but directly on the space of signals which is (almost) in Full Monitoring. One unique assumption: W is continuous. The result holds with any continuous function G on ∆(I) × ∆(S)I (eg the optimistic case G(x, µ) = supy∈s−1 (µ) ρ(x, y)). The size of J and its finiteness plays no role: Same result with J = [−1, 1]I and s : [−1, 1]I ⇒ ∆(S)I : If s is convex and its range is a polyhedron then W is continuous.

18

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Conclusion- on going work

Advantages of Calibrated strategy(2) Rates of convergence: For any continuous function G : ∆(I) × ∆(S)I "    + #  |Nn (l)| 1 Eσ ,τ max G (z, µ n (l)) − G (x(l), µ n (l)) − ε = O 1/2 . n n z∈∆(I) For the worst case function W, (P. - choose properly each µ(l)) "   + #  1 |Nn (l)| = O 1/3 . Eσ ,τ max W (z, µ n (l)) − W (x(l), µ n (l)) n n z∈∆(I) Approachability with random signals P. - Complete characterization of approachable convex sets with partial monitoring:  ∀µ ∈ ∆(S)I , ∃x ∈ ∆(I), ρ(x, y), y ∈ s−1 (µ) ⊂ C 19

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Conclusion- on going work

Advantages of Calibrated strategy(2) Rates of convergence: For any continuous function G : ∆(I) × ∆(S)I "    + #  |Nn (l)| 1 Eσ ,τ max G (z, µ n (l)) − G (x(l), µ n (l)) − ε = O 1/2 . n n z∈∆(I) For the worst case function W, (P. - choose properly each µ(l)) "   + #  1 |Nn (l)| = O 1/3 . max W (z, µ n (l)) − W (x(l), µ n (l)) Eσ ,τ n n z∈∆(I) Approachability with random signals P. - Complete characterization of approachable convex sets with partial monitoring:  ∀µ ∈ ∆(S)I , ∃x ∈ ∆(I), ρ(x, y), y ∈ s−1 (µ) ⊂ C 19

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Conclusion- on going work

Advantages of Calibrated strategy(2) Rates of convergence: For any continuous function G : ∆(I) × ∆(S)I "    + #  |Nn (l)| 1 Eσ ,τ max G (z, µ n (l)) − G (x(l), µ n (l)) − ε = O 1/2 . n n z∈∆(I) For the worst case function W, (P. - choose properly each µ(l)) "   + #  1 |Nn (l)| = O 1/3 . max W (z, µ n (l)) − W (x(l), µ n (l)) Eσ ,τ n n z∈∆(I) Approachability with random signals P. - Complete characterization of approachable convex sets with partial monitoring:  ∀µ ∈ ∆(S)I , ∃x ∈ ∆(I), ρ(x, y), y ∈ s−1 (µ) ⊂ C 19

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Conclusion- on going work

Concluding Scheme

Approachability of an orthant (in Full Monitoring) ⇓ No-Regret (in Full Monitoring) ⇓ Calibration (in Full Monitoring) ⇓ No-regret (in Partial Monitoring) ⇓ Approachability of a convex (in Partial Monitoring) ⇒: construction of an explicit strategy (finding an invariant measure reduces to solve a system of linear equations).

20

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

references

References Book on those tools : Cesa-Bianchi, N. and Lugosi, G. Prediction, Learning, and Games, Cambridge University Press (2006) Chapter 4 and 6. Approachability : Blackwell, D. An analog of the minimax theorem for vector payoffs, Pacific J. Math. (1956) Existence of Calibrated strategy: Foster, D. P. and Vohra, R. V. Asymptotic calibration Biometrika (1998) No-Regret and Approachability Hart, S. and Mas-Colell, A. A simple adaptive procedure leading to correlated equilibrium, Econometrica (2000) No-Regret and Calibration: Sorin, S. Lectures on Dynamics in Games, Unpublished Lecture Notes (2008) Existence of strategies with no external regret in partial monitoring : Rustichini, A. Minimizing regret: the general case, Games Econom. Behav. (1999) Construction of strategies with no external regret: Lugosi, G. and Mannor, S. and Stoltz, G. Strategies for prediction under imperfect monitoring, Math. Oper. Res. (2008) Approachability in Partial Monitoring Perchet, V. Approachability of a Convex with Partial Monitoring, manuscript Algorithms for no regret:   Perchet, V. Algorithms for No-Regret in O n−1/3 with Random Signals, manuscript Lehrer, E. and Solan, E. Learning to Play Partially Specified Correlated Equilibrium, manuscript

21

Vianney Perchet

Calibration and Internal No-Regret with Random Signals

Calibration and Internal No-Regret with Random Signals

Oct 4, 2009 - Stage n : P1 and Nature choose in and jn. P1 gets ρn = ρ(in,jn) and observes sn, whose law is s(in,jn). Strategies. P1 : σ = (σn) n∈N. ,σn : (I ×S) n → ∆(I). Nature : τ = (τn) n∈N. ,τn : (I ×J ×S) n → ∆(J). Pσ,τ probability generated by (σ,τ) on (I ×J ×S). N . 5. Vianney Perchet. Calibration and Internal No-Regret ...

282KB Sizes 1 Downloads 216 Views

Recommend Documents

No documents