David Parkes - CI 2016.pdf

Viewer
Transcript

Introduction

Model

Main Results

Empirical study

Discussion

How to elicit information when it is not possible to verify the answer David C. Parkes Computer Science John A. Paulson School of Engineering and Applied Sciences Harvard University

June 2, 2016

Joint work with Arpit Agarwal (IISc. Bangalore), Rafael Frongillo (CU Boulder), and Victor Shnayder (Harvard).

Introduction

Model

Main Results

Example: Search results Which shopping search result is best?

About the same Left side Right side

Empirical study

Discussion

Introduction

Model

Main Results

Example: Content evaluation What emotion do you feel?

Happy Astonished Scared

Empirical study

Discussion

Introduction

Model

Main Results

Empirical study

Discussion

Information elicitation without verification Examples: search feedback, content evaluation, Google local places, marketing, surveys, ... Problem characteristics:

May require effort to form an informed opinion. Likely to be disagreement. Want individual opinions, not likely reports by others. No external input— not possible to verify the answer. The question: Can we design payment schemes to promote effort and truthful responses?

Introduction

Model

Main Results

Empirical study

Discussion

Information elicitation without verification Examples: search feedback, content evaluation, Google local places, marketing, surveys, ... Problem characteristics:

May require effort to form an informed opinion. Likely to be disagreement. Want individual opinions, not likely reports by others. No external input— not possible to verify the answer. The question: Can we design payment schemes to promote effort and truthful responses?

Introduction

Model

Main Results

Empirical study

Discussion

Information elicitation without verification Examples: search feedback, content evaluation, Google local places, marketing, surveys, ... Problem characteristics:

May require effort to form an informed opinion. Likely to be disagreement. Want individual opinions, not likely reports by others. No external input— not possible to verify the answer. The question: Can we design payment schemes to promote effort and truthful responses?

Introduction

Model

Main Results

Empirical study

Peer Prediction Coined by Miller, Resnick and Zeckhauser (2005) Pay an agent according to how well its report “predicts” the report of another. World model is a joint distribution on “signals” (X1 , X2 ). Example:



 0.3 0.1 0    0.05 0.05 0.05  0 0.15 0.3

Discussion

Introduction

Model

Main Results

Empirical study

Peer Prediction Coined by Miller, Resnick and Zeckhauser (2005) Pay an agent according to how well its report “predicts” the report of another. World model is a joint distribution on “signals” (X1 , X2 ). Example:



 0.3 0.1 0    0.05 0.05 0.05  0 0.15 0.3

Discussion

Introduction

Model

Main Results

Empirical study

Example 1: Output agreement Payment rule:

report r1

1 2

report r2 1 2 (1,1) (0,0) (0,0) (1,1)

Truth is a corr. eq. if P(X2 = 1|X1 = 1) > P(X2 = 2|X1 = 1). But uninformative strategy profile (1,1) dominates.

Discussion

Introduction

Model

Main Results

Empirical study

Example 1: Output agreement Payment rule:

report r1

1 2

report r2 1 2 (1,1) (0,0) (0,0) (1,1)

Truth is a corr. eq. if P(X2 = 1|X1 = 1) > P(X2 = 2|X1 = 1). But uninformative strategy profile (1,1) dominates.

Discussion

Introduction

Model

Main Results

Empirical study

Example 1: Output agreement Payment rule:

report r1

1 2

report r2 1 2 (1,1) (0,0) (0,0) (1,1)

Truth is a corr. eq. if P(X2 = 1|X1 = 1) > P(X2 = 2|X1 = 1). But uninformative strategy profile (1,1) dominates.

Discussion

Introduction

Model

Main Results

Empirical study

Discussion

Example 2: 1/Prior Mechanism Faltings et al. (2012) Designer knows marginal probability, P(X)

report r2 1 report r1

1 2

1 1 , P(1) ) ( P(1)

2 (0,0)

(0,0)

1 1 ( P(2) , P(2) )

Truthful reporting is a corr. eq. Suppose P(1) < P(2); then reports (1, 1) dominate. Similar story for other mechanisms (e.g., MRZ’15).

Introduction

Model

Main Results

Empirical study

Discussion

Example 2: 1/Prior Mechanism Faltings et al. (2012) Designer knows marginal probability, P(X)

report r2 1 report r1

1 2

1 1 , P(1) ) ( P(1)

2 (0,0)

(0,0)

1 1 ( P(2) , P(2) )

Truthful reporting is a corr. eq. Suppose P(1) < P(2); then reports (1, 1) dominate. Similar story for other mechanisms (e.g., MRZ’15).

Introduction

Model

Main Results

Empirical study

Discussion

Example 2: 1/Prior Mechanism Faltings et al. (2012) Designer knows marginal probability, P(X)

report r2 1 report r1

1 2

1 1 , P(1) ) ( P(1)

2 (0,0)

(0,0)

1 1 ( P(2) , P(2) )

Truthful reporting is a corr. eq. Suppose P(1) < P(2); then reports (1, 1) dominate. Similar story for other mechanisms (e.g., MRZ’15).

Introduction

Model

Main Results

Empirical study

Discussion

Example 2: 1/Prior Mechanism Faltings et al. (2012) Designer knows marginal probability, P(X)

report r2 1 report r1

1 2

1 1 , P(1) ) ( P(1)

2 (0,0)

(0,0)

1 1 ( P(2) , P(2) )

Truthful reporting is a corr. eq. Suppose P(1) < P(2); then reports (1, 1) dominate. Similar story for other mechanisms (e.g., MRZ’15).

Introduction

Model

Main Results

Empirical study

Discussion

A New Goal: Informed Truthfulness Definition (Informed Truthfulness) 1

Truthful strategy profile provides as much expected payoff as any other strategy profile.

2

Any uninformed strategy provides strictly less.

Responsive to main concerns (invest effort, truthful given effort).

Introduction

Model

Main Results

Empirical study

Discussion

A New Goal: Informed Truthfulness Definition (Informed Truthfulness) 1

Truthful strategy profile provides as much expected payoff as any other strategy profile.

2

Any uninformed strategy provides strictly less.

Responsive to main concerns (invest effort, truthful given effort).

Introduction

Model

Main Results

Empirical study

Main results Multi-task peer prediction: Correlated agreement mechanism is informed truthful for n ≥ 2 agents and k ≥ 3 tasks. Need signal correlation structure. Can also learn the correlation structure from reports, and attain ε-informed-truthfulness. Prior work was for binary signals (DG13), or an asymptotically large number of reports (RF15, RFJ16,K+15). Independent work: Kong and Schoenebeck (arXiv 2016).

Discussion

Introduction

Model

Main Results

Empirical study

Main results Multi-task peer prediction: Correlated agreement mechanism is informed truthful for n ≥ 2 agents and k ≥ 3 tasks. Need signal correlation structure. Can also learn the correlation structure from reports, and attain ε-informed-truthfulness. Prior work was for binary signals (DG13), or an asymptotically large number of reports (RF15, RFJ16,K+15). Independent work: Kong and Schoenebeck (arXiv 2016).

Discussion

Introduction

Model

Main Results

Empirical study

Main results Multi-task peer prediction: Correlated agreement mechanism is informed truthful for n ≥ 2 agents and k ≥ 3 tasks. Need signal correlation structure. Can also learn the correlation structure from reports, and attain ε-informed-truthfulness. Prior work was for binary signals (DG13), or an asymptotically large number of reports (RF15, RFJ16,K+15). Independent work: Kong and Schoenebeck (arXiv 2016).

Discussion

Introduction

Model

Main Results

Empirical study

Motivation: mTurk Experiments! Gao et al., 2014

(b) 1/prior

(a) OA MM = M&Ms

GB = Gummy Bear

Replicator dynamics (SFP16)

Discussion

Introduction

Model

Main Results

Empirical study

Multi-task Peer Prediction Agents 1, 2 (n in general). Multiple tasks. k (≥ 3). Some tasks are bonus tasks. Signal , j ∈ {1, . . . , m} of agent 1 and 2 on a task. Joint signal distribution P(X1 , X2 ). Delta matrix Δ: Δj = P(, j) − P()P(j) If Δj > 0, positive correlation.

Discussion

Introduction

Model

Main Results

Empirical study

Multi-task Peer Prediction Agents 1, 2 (n in general). Multiple tasks. k (≥ 3). Some tasks are bonus tasks. Signal , j ∈ {1, . . . , m} of agent 1 and 2 on a task. Joint signal distribution P(X1 , X2 ). Delta matrix Δ: Δj = P(, j) − P()P(j) If Δj > 0, positive correlation.

Discussion

Introduction

Model

Main Results

Agent Behavior Strategies: Agent 1: Fr = P(r1 = r|X1 = ) Agent 2: Gjr = P(r2 = r|X2 = j) Expected payment E(F, G) for a bonus task. Definition (Informed truthful mechanisms) Expected payments satisfy:

∗

1

E(F ∗ , G∗ ) ≥ E(F, G), for all F, all G

2

E(F ∗ , G∗ ) > E(F ◦ , G), for all F ◦ , all G.

is truthful; ◦ is uninformed.

Empirical study

Discussion

Introduction

Model

Main Results

Agent Behavior Strategies: Agent 1: Fr = P(r1 = r|X1 = ) Agent 2: Gjr = P(r2 = r|X2 = j) Expected payment E(F, G) for a bonus task. Definition (Informed truthful mechanisms) Expected payments satisfy:

∗

1

E(F ∗ , G∗ ) ≥ E(F, G), for all F, all G

2

E(F ∗ , G∗ ) > E(F ◦ , G), for all F ◦ , all G.

is truthful; ◦ is uninformed.

Empirical study

Discussion

Introduction

Model

Main Results

Agent Behavior Strategies: Agent 1: Fr = P(r1 = r|X1 = ) Agent 2: Gjr = P(r2 = r|X2 = j) Expected payment E(F, G) for a bonus task. Definition (Informed truthful mechanisms) Expected payments satisfy:

∗

1

E(F ∗ , G∗ ) ≥ E(F, G), for all F, all G

2

E(F ∗ , G∗ ) > E(F ◦ , G), for all F ◦ , all G.

is truthful; ◦ is uninformed.

Empirical study

Discussion

Introduction

Model

Main Results

Empirical study

Discussion

Family of Mechanisms (following DG13) Parameterized by score S : {1, . . . , m} × {1, . . . , m} 7→ R Definition (Multi-task mechanism) For a bonus task b, also pick some task ℓ ∈ / Tb assigned to 1 and some task ℓ0 ∈ / Tb assigned to 2. Pay agents: 0

S(r1b , r2b ) − S(r1ℓ , r2ℓ ). Example score matrix: 

 1 1 0   S= 1 1 0  0 0 1

Introduction

Model

Main Results

Empirical study

The Correlated Agreement mechanism

Definition (CA mechanism) Adopt score matrix S(, j) = 1 if Δj > 0, with S(, j) = 0 otherwise. Theorem 1 The CA mechanism is informed-truthful. Example:    + + − 1 1 0     sgn(Δ) =  + + −  ; S =  1 1 0  − − + 0 0 1 

Discussion

Introduction

Model

Main Results

Empirical study

Analysis of the CA mechanism The expected payment on a bonus task is: X X P()P(j)S(F , Gj ) P(, j)S(F , Gj ) − E(F, G) = j

j

Discussion

Introduction

Model

Main Results

Empirical study

Analysis of the CA mechanism The expected payment on a bonus task is: X X P()P(j)S(F , Gj ) P(, j)S(F , Gj ) − E(F, G) = j

j

=

X

Δj · S(F , Gj )

j

Incentives: agents want to score 1 when Δj > 0, 0 otherwise. (F ∗ , G∗ ) achieves just this in the CA mechanism:     + + − 1 1 0     sgn(Δ) =  + + −  ; S =  1 1 0  − − + 0 0 1

Discussion

Introduction

Model

Main Results

Empirical study

Analysis of the CA mechanism The expected payment on a bonus task is: X X P()P(j)S(F , Gj ) P(, j)S(F , Gj ) − E(F, G) = j

j

=

X

Δj · S(F , Gj )

j

Incentives: agents want to score 1 when Δj > 0, 0 otherwise. (F ∗ , G∗ ) achieves just this in the CA mechanism:     + + − 1 1 0     sgn(Δ) =  + + −  ; S =  1 1 0  − − + 0 0 1

Discussion

Introduction

Model

Main Results

Empirical study

Analysis of the CA mechanism The expected payment on a bonus task is: X X P()P(j)S(F , Gj ) P(, j)S(F , Gj ) − E(F, G) = j

j

=

X

Δj · S(F , Gj )

j

Incentives: agents want to score 1 when Δj > 0, 0 otherwise. (F ∗ , G∗ ) achieves just this in the CA mechanism:     + + − 1 1 0     sgn(Δ) =  + + −  ; S =  1 1 0  − − + 0 0 1

Discussion

Introduction

Model

Main Results

Empirical study

Discussion

Analysis of the CA mechanism (cont.) Theorem 1 The CA mechanism is informed-truthful. (1) Can do no better than reporting truthfully: E(F ∗ , G∗ ) =

X

Δj · S(, j) =

j

X

Δj ≥

X

j:Δj >0

Δj · S(F , Gj ) = E(F, G).

j

(2) Uninformed is worse (e.g., F◦ = ‘10 ): E(F ◦ , G) =

XX 

j

Δj · S(1, Gj ) =

X j

S(1, Gj )

X 

Δj = 0 < E(F ∗ , G∗ )

Introduction

Model

Main Results

Empirical study

Discussion

Analysis of the CA mechanism (cont.) Theorem 1 The CA mechanism is informed-truthful. (1) Can do no better than reporting truthfully: E(F ∗ , G∗ ) =

X

Δj · S(, j) =

j

X

Δj ≥

X

j:Δj >0

Δj · S(F , Gj ) = E(F, G).

j

(2) Uninformed is worse (e.g., F◦ = ‘10 ): E(F ◦ , G) =

XX 

j

Δj · S(1, Gj ) =

X j

S(1, Gj )

X 

Δj = 0 < E(F ∗ , G∗ )

Introduction

Model

Main Results

Empirical study

Discussion

Special case: Categorical Domains

   + − − 1 0 0     sgn(Δ) =  − + − ; S =  0 1 0  − − + 0 0 1 

Consider an image-labeling task {swim, fly, walk}, vs. feedback {3*, 4*, 5*}. Theorem 2 The CA mechanism is strong truthful in a categorical domain. Strong truthful: truthful behavior strictly higher expected payment than any other strategy profile (except permutations).

Introduction

Model

Main Results

Empirical study

Discussion

Special case: Categorical Domains

   + − − 1 0 0     sgn(Δ) =  − + − ; S =  0 1 0  − − + 0 0 1 

Consider an image-labeling task {swim, fly, walk}, vs. feedback {3*, 4*, 5*}. Theorem 2 The CA mechanism is strong truthful in a categorical domain. Strong truthful: truthful behavior strictly higher expected payment than any other strategy profile (except permutations).

Introduction

Model

Main Results

Empirical study

Discussion

Special case: Categorical Domains

   + − − 1 0 0     sgn(Δ) =  − + − ; S =  0 1 0  − − + 0 0 1 

Consider an image-labeling task {swim, fly, walk}, vs. feedback {3*, 4*, 5*}. Theorem 2 The CA mechanism is strong truthful in a categorical domain. Strong truthful: truthful behavior strictly higher expected payment than any other strategy profile (except permutations).

Introduction

Model

Main Results

Empirical study

Discussion

Strong-truthfulness in General Domains We can show: Impossible to achieve strong-truthfulness on additional signal distributions while just using signal corr. structure. There are symmetric signal distributions for which no multi-task mechanism is strongly truthful.

Introduction

Model

Main Results

Empirical study

A Detail-Free Mechanism

CA-DF mechanism: 1

Estimate the correlation structure from reports on k tasks.

2

Use this to define score matrix S.

Idea: the “truthful score matrix” maximizes expected payment. Theorem 3 (Informal) For O(m3 log(1/ δ)/ ε2 ) tasks, with prob. at least 1 − δ: 1

No strategy profile is more than ε better than truth.

2

Any uninformed strategy is worse than truth.

Discussion

Introduction

Model

Main Results

Empirical study

A Detail-Free Mechanism

CA-DF mechanism: 1

Estimate the correlation structure from reports on k tasks.

2

Use this to define score matrix S.

Idea: the “truthful score matrix” maximizes expected payment. Theorem 3 (Informal) For O(m3 log(1/ δ)/ ε2 ) tasks, with prob. at least 1 − δ: 1

No strategy profile is more than ε better than truth.

2

Any uninformed strategy is worse than truth.

Discussion

Introduction

Model

Main Results

Empirical study

A Detail-Free Mechanism

CA-DF mechanism: 1

Estimate the correlation structure from reports on k tasks.

2

Use this to define score matrix S.

Idea: the “truthful score matrix” maximizes expected payment. Theorem 3 (Informal) For O(m3 log(1/ δ)/ ε2 ) tasks, with prob. at least 1 − δ: 1

No strategy profile is more than ε better than truth.

2

Any uninformed strategy is worse than truth.

Discussion

Introduction

Model

Main Results

Empirical study

Discussion

Peer-Assessment Domains 325,000 peer assessment responses to ∼100 questions across ∼30 exercises in 17 MOOCs Vast majority of questions have m ∈ {2, 3, 4}. Example rubric element: “Not much of a style at all”, “Communicative style”, and “Strong, flowing writing style”.

Pos. corr. Not pos. corr.

60 40

40

20

20

0

2

3

4

5

Categorical Not categorical

60

0

2

3

4

5

Introduction

Model

Main Results

Empirical study

Peer-Assessment Domains (cont.) Correlation structure: 2

Average ∆ matrices 3

4

5

0.10 0.05 0.00 0.05 0.10

2/3 of the worlds are not categorical. Rather, a response ‘3’ tends to be positively correlated with a ‘2.’

Discussion

Introduction

Model

Main Results

Empirical study

Summary: Multi-Task Peer Prediction A way to elicit information from participants without any external inputs. Promote differences of opinion, not ‘group think.’ The CA mechanism is informed truthful: truthful reporting is the best possible, and anything uninformed is strictly worse Maximally strong truthful! Detail-Free CA mechanism can estimate statistics.

Discussion

Introduction

Model

Main Results

Empirical study

Summary: Multi-Task Peer Prediction A way to elicit information from participants without any external inputs. Promote differences of opinion, not ‘group think.’ The CA mechanism is informed truthful: truthful reporting is the best possible, and anything uninformed is strictly worse Maximally strong truthful! Detail-Free CA mechanism can estimate statistics.

Discussion

Introduction

Model

Main Results

Empirical study

Summary: Multi-Task Peer Prediction A way to elicit information from participants without any external inputs. Promote differences of opinion, not ‘group think.’ The CA mechanism is informed truthful: truthful reporting is the best possible, and anything uninformed is strictly worse Maximally strong truthful! Detail-Free CA mechanism can estimate statistics.

Discussion

Introduction

Model

Main Results

Empirical study

Summary: Multi-Task Peer Prediction A way to elicit information from participants without any external inputs. Promote differences of opinion, not ‘group think.’ The CA mechanism is informed truthful: truthful reporting is the best possible, and anything uninformed is strictly worse Maximally strong truthful! Detail-Free CA mechanism can estimate statistics.

Discussion

Introduction

Model

Main Results

Empirical study

Summary: Multi-Task Peer Prediction A way to elicit information from participants without any external inputs. Promote differences of opinion, not ‘group think.’ The CA mechanism is informed truthful: truthful reporting is the best possible, and anything uninformed is strictly worse Maximally strong truthful! Detail-Free CA mechanism can estimate statistics.

Discussion

Introduction

Model

Main Results

Empirical study

Discussion

Future Work Extend to non-binary effort models. Combine with label-aggregation (Dawid-Skene 1979) models, allow for heterogeneity amongst agents. Handle large signal spaces, endogenous task selection. Large-scale validation.

Introduction

Model

Main Results

Empirical study

Discussion

Future Work Extend to non-binary effort models. Combine with label-aggregation (Dawid-Skene 1979) models, allow for heterogeneity amongst agents. Handle large signal spaces, endogenous task selection. Large-scale validation.

Introduction

Model

Main Results

Empirical study

Discussion

Future Work Extend to non-binary effort models. Combine with label-aggregation (Dawid-Skene 1979) models, allow for heterogeneity amongst agents. Handle large signal spaces, endogenous task selection. Large-scale validation.

Introduction

Model

Main Results

Empirical study

Discussion

Future Work Extend to non-binary effort models. Combine with label-aggregation (Dawid-Skene 1979) models, allow for heterogeneity amongst agents. Handle large signal spaces, endogenous task selection. Large-scale validation.

Introduction

Model

Main Results

Thank you

Empirical study

Discussion