Introduction
Model
Main Results
Empirical study
Discussion
How to elicit information when it is not possible to verify the answer David C. Parkes Computer Science John A. Paulson School of Engineering and Applied Sciences Harvard University
June 2, 2016
Joint work with Arpit Agarwal (IISc. Bangalore), Rafael Frongillo (CU Boulder), and Victor Shnayder (Harvard).
Introduction
Model
Main Results
Example: Search results Which shopping search result is best?
About the same Left side Right side
Empirical study
Discussion
Introduction
Model
Main Results
Example: Content evaluation What emotion do you feel?
Happy Astonished Scared
Empirical study
Discussion
Introduction
Model
Main Results
Empirical study
Discussion
Information elicitation without verification Examples: search feedback, content evaluation, Google local places, marketing, surveys, ... Problem characteristics:
May require effort to form an informed opinion. Likely to be disagreement. Want individual opinions, not likely reports by others. No external input— not possible to verify the answer. The question: Can we design payment schemes to promote effort and truthful responses?
Introduction
Model
Main Results
Empirical study
Discussion
Information elicitation without verification Examples: search feedback, content evaluation, Google local places, marketing, surveys, ... Problem characteristics:
May require effort to form an informed opinion. Likely to be disagreement. Want individual opinions, not likely reports by others. No external input— not possible to verify the answer. The question: Can we design payment schemes to promote effort and truthful responses?
Introduction
Model
Main Results
Empirical study
Discussion
Information elicitation without verification Examples: search feedback, content evaluation, Google local places, marketing, surveys, ... Problem characteristics:
May require effort to form an informed opinion. Likely to be disagreement. Want individual opinions, not likely reports by others. No external input— not possible to verify the answer. The question: Can we design payment schemes to promote effort and truthful responses?
Introduction
Model
Main Results
Empirical study
Peer Prediction Coined by Miller, Resnick and Zeckhauser (2005) Pay an agent according to how well its report “predicts” the report of another. World model is a joint distribution on “signals” (X1 , X2 ). Example:
0.3 0.1 0 0.05 0.05 0.05 0 0.15 0.3
Discussion
Introduction
Model
Main Results
Empirical study
Peer Prediction Coined by Miller, Resnick and Zeckhauser (2005) Pay an agent according to how well its report “predicts” the report of another. World model is a joint distribution on “signals” (X1 , X2 ). Example:
0.3 0.1 0 0.05 0.05 0.05 0 0.15 0.3
Discussion
Introduction
Model
Main Results
Empirical study
Example 1: Output agreement Payment rule:
report r1
1 2
report r2 1 2 (1,1) (0,0) (0,0) (1,1)
Truth is a corr. eq. if P(X2 = 1|X1 = 1) > P(X2 = 2|X1 = 1). But uninformative strategy profile (1,1) dominates.
Discussion
Introduction
Model
Main Results
Empirical study
Example 1: Output agreement Payment rule:
report r1
1 2
report r2 1 2 (1,1) (0,0) (0,0) (1,1)
Truth is a corr. eq. if P(X2 = 1|X1 = 1) > P(X2 = 2|X1 = 1). But uninformative strategy profile (1,1) dominates.
Discussion
Introduction
Model
Main Results
Empirical study
Example 1: Output agreement Payment rule:
report r1
1 2
report r2 1 2 (1,1) (0,0) (0,0) (1,1)
Truth is a corr. eq. if P(X2 = 1|X1 = 1) > P(X2 = 2|X1 = 1). But uninformative strategy profile (1,1) dominates.
Discussion
Introduction
Model
Main Results
Empirical study
Discussion
Example 2: 1/Prior Mechanism Faltings et al. (2012) Designer knows marginal probability, P(X)
report r2 1 report r1
1 2
1 1 , P(1) ) ( P(1)
2 (0,0)
(0,0)
1 1 ( P(2) , P(2) )
Truthful reporting is a corr. eq. Suppose P(1) < P(2); then reports (1, 1) dominate. Similar story for other mechanisms (e.g., MRZ’15).
Introduction
Model
Main Results
Empirical study
Discussion
Example 2: 1/Prior Mechanism Faltings et al. (2012) Designer knows marginal probability, P(X)
report r2 1 report r1
1 2
1 1 , P(1) ) ( P(1)
2 (0,0)
(0,0)
1 1 ( P(2) , P(2) )
Truthful reporting is a corr. eq. Suppose P(1) < P(2); then reports (1, 1) dominate. Similar story for other mechanisms (e.g., MRZ’15).
Introduction
Model
Main Results
Empirical study
Discussion
Example 2: 1/Prior Mechanism Faltings et al. (2012) Designer knows marginal probability, P(X)
report r2 1 report r1
1 2
1 1 , P(1) ) ( P(1)
2 (0,0)
(0,0)
1 1 ( P(2) , P(2) )
Truthful reporting is a corr. eq. Suppose P(1) < P(2); then reports (1, 1) dominate. Similar story for other mechanisms (e.g., MRZ’15).
Introduction
Model
Main Results
Empirical study
Discussion
Example 2: 1/Prior Mechanism Faltings et al. (2012) Designer knows marginal probability, P(X)
report r2 1 report r1
1 2
1 1 , P(1) ) ( P(1)
2 (0,0)
(0,0)
1 1 ( P(2) , P(2) )
Truthful reporting is a corr. eq. Suppose P(1) < P(2); then reports (1, 1) dominate. Similar story for other mechanisms (e.g., MRZ’15).
Introduction
Model
Main Results
Empirical study
Discussion
A New Goal: Informed Truthfulness Definition (Informed Truthfulness) 1
Truthful strategy profile provides as much expected payoff as any other strategy profile.
2
Any uninformed strategy provides strictly less.
Responsive to main concerns (invest effort, truthful given effort).
Introduction
Model
Main Results
Empirical study
Discussion
A New Goal: Informed Truthfulness Definition (Informed Truthfulness) 1
Truthful strategy profile provides as much expected payoff as any other strategy profile.
2
Any uninformed strategy provides strictly less.
Responsive to main concerns (invest effort, truthful given effort).
Introduction
Model
Main Results
Empirical study
Main results Multi-task peer prediction: Correlated agreement mechanism is informed truthful for n ≥ 2 agents and k ≥ 3 tasks. Need signal correlation structure. Can also learn the correlation structure from reports, and attain ε-informed-truthfulness. Prior work was for binary signals (DG13), or an asymptotically large number of reports (RF15, RFJ16,K+15). Independent work: Kong and Schoenebeck (arXiv 2016).
Discussion
Introduction
Model
Main Results
Empirical study
Main results Multi-task peer prediction: Correlated agreement mechanism is informed truthful for n ≥ 2 agents and k ≥ 3 tasks. Need signal correlation structure. Can also learn the correlation structure from reports, and attain ε-informed-truthfulness. Prior work was for binary signals (DG13), or an asymptotically large number of reports (RF15, RFJ16,K+15). Independent work: Kong and Schoenebeck (arXiv 2016).
Discussion
Introduction
Model
Main Results
Empirical study
Main results Multi-task peer prediction: Correlated agreement mechanism is informed truthful for n ≥ 2 agents and k ≥ 3 tasks. Need signal correlation structure. Can also learn the correlation structure from reports, and attain ε-informed-truthfulness. Prior work was for binary signals (DG13), or an asymptotically large number of reports (RF15, RFJ16,K+15). Independent work: Kong and Schoenebeck (arXiv 2016).
Discussion
Introduction
Model
Main Results
Empirical study
Motivation: mTurk Experiments! Gao et al., 2014
(b) 1/prior
(a) OA MM = M&Ms
GB = Gummy Bear
Replicator dynamics (SFP16)
Discussion
Introduction
Model
Main Results
Empirical study
Multi-task Peer Prediction Agents 1, 2 (n in general). Multiple tasks. k (≥ 3). Some tasks are bonus tasks. Signal , j ∈ {1, . . . , m} of agent 1 and 2 on a task. Joint signal distribution P(X1 , X2 ). Delta matrix Δ: Δj = P(, j) − P()P(j) If Δj > 0, positive correlation.
Discussion
Introduction
Model
Main Results
Empirical study
Multi-task Peer Prediction Agents 1, 2 (n in general). Multiple tasks. k (≥ 3). Some tasks are bonus tasks. Signal , j ∈ {1, . . . , m} of agent 1 and 2 on a task. Joint signal distribution P(X1 , X2 ). Delta matrix Δ: Δj = P(, j) − P()P(j) If Δj > 0, positive correlation.
Discussion
Introduction
Model
Main Results
Agent Behavior Strategies: Agent 1: Fr = P(r1 = r|X1 = ) Agent 2: Gjr = P(r2 = r|X2 = j) Expected payment E(F, G) for a bonus task. Definition (Informed truthful mechanisms) Expected payments satisfy:
∗
1
E(F ∗ , G∗ ) ≥ E(F, G), for all F, all G
2
E(F ∗ , G∗ ) > E(F ◦ , G), for all F ◦ , all G.
is truthful; ◦ is uninformed.
Empirical study
Discussion
Introduction
Model
Main Results
Agent Behavior Strategies: Agent 1: Fr = P(r1 = r|X1 = ) Agent 2: Gjr = P(r2 = r|X2 = j) Expected payment E(F, G) for a bonus task. Definition (Informed truthful mechanisms) Expected payments satisfy:
∗
1
E(F ∗ , G∗ ) ≥ E(F, G), for all F, all G
2
E(F ∗ , G∗ ) > E(F ◦ , G), for all F ◦ , all G.
is truthful; ◦ is uninformed.
Empirical study
Discussion
Introduction
Model
Main Results
Agent Behavior Strategies: Agent 1: Fr = P(r1 = r|X1 = ) Agent 2: Gjr = P(r2 = r|X2 = j) Expected payment E(F, G) for a bonus task. Definition (Informed truthful mechanisms) Expected payments satisfy:
∗
1
E(F ∗ , G∗ ) ≥ E(F, G), for all F, all G
2
E(F ∗ , G∗ ) > E(F ◦ , G), for all F ◦ , all G.
is truthful; ◦ is uninformed.
Empirical study
Discussion
Introduction
Model
Main Results
Empirical study
Discussion
Family of Mechanisms (following DG13) Parameterized by score S : {1, . . . , m} × {1, . . . , m} 7→ R Definition (Multi-task mechanism) For a bonus task b, also pick some task ℓ ∈ / Tb assigned to 1 and some task ℓ0 ∈ / Tb assigned to 2. Pay agents: 0
S(r1b , r2b ) − S(r1ℓ , r2ℓ ). Example score matrix:
1 1 0 S= 1 1 0 0 0 1
Introduction
Model
Main Results
Empirical study
The Correlated Agreement mechanism
Definition (CA mechanism) Adopt score matrix S(, j) = 1 if Δj > 0, with S(, j) = 0 otherwise. Theorem 1 The CA mechanism is informed-truthful. Example: + + − 1 1 0 sgn(Δ) = + + − ; S = 1 1 0 − − + 0 0 1
Discussion
Introduction
Model
Main Results
Empirical study
Analysis of the CA mechanism The expected payment on a bonus task is: X X P()P(j)S(F , Gj ) P(, j)S(F , Gj ) − E(F, G) = j
j
Discussion
Introduction
Model
Main Results
Empirical study
Analysis of the CA mechanism The expected payment on a bonus task is: X X P()P(j)S(F , Gj ) P(, j)S(F , Gj ) − E(F, G) = j
j
=
X
Δj · S(F , Gj )
j
Incentives: agents want to score 1 when Δj > 0, 0 otherwise. (F ∗ , G∗ ) achieves just this in the CA mechanism: + + − 1 1 0 sgn(Δ) = + + − ; S = 1 1 0 − − + 0 0 1
Discussion
Introduction
Model
Main Results
Empirical study
Analysis of the CA mechanism The expected payment on a bonus task is: X X P()P(j)S(F , Gj ) P(, j)S(F , Gj ) − E(F, G) = j
j
=
X
Δj · S(F , Gj )
j
Incentives: agents want to score 1 when Δj > 0, 0 otherwise. (F ∗ , G∗ ) achieves just this in the CA mechanism: + + − 1 1 0 sgn(Δ) = + + − ; S = 1 1 0 − − + 0 0 1
Discussion
Introduction
Model
Main Results
Empirical study
Analysis of the CA mechanism The expected payment on a bonus task is: X X P()P(j)S(F , Gj ) P(, j)S(F , Gj ) − E(F, G) = j
j
=
X
Δj · S(F , Gj )
j
Incentives: agents want to score 1 when Δj > 0, 0 otherwise. (F ∗ , G∗ ) achieves just this in the CA mechanism: + + − 1 1 0 sgn(Δ) = + + − ; S = 1 1 0 − − + 0 0 1
Discussion
Introduction
Model
Main Results
Empirical study
Discussion
Analysis of the CA mechanism (cont.) Theorem 1 The CA mechanism is informed-truthful. (1) Can do no better than reporting truthfully: E(F ∗ , G∗ ) =
X
Δj · S(, j) =
j
X
Δj ≥
X
j:Δj >0
Δj · S(F , Gj ) = E(F, G).
j
(2) Uninformed is worse (e.g., F◦ = ‘10 ): E(F ◦ , G) =
XX
j
Δj · S(1, Gj ) =
X j
S(1, Gj )
X
Δj = 0 < E(F ∗ , G∗ )
Introduction
Model
Main Results
Empirical study
Discussion
Analysis of the CA mechanism (cont.) Theorem 1 The CA mechanism is informed-truthful. (1) Can do no better than reporting truthfully: E(F ∗ , G∗ ) =
X
Δj · S(, j) =
j
X
Δj ≥
X
j:Δj >0
Δj · S(F , Gj ) = E(F, G).
j
(2) Uninformed is worse (e.g., F◦ = ‘10 ): E(F ◦ , G) =
XX
j
Δj · S(1, Gj ) =
X j
S(1, Gj )
X
Δj = 0 < E(F ∗ , G∗ )
Introduction
Model
Main Results
Empirical study
Discussion
Special case: Categorical Domains
+ − − 1 0 0 sgn(Δ) = − + − ; S = 0 1 0 − − + 0 0 1
Consider an image-labeling task {swim, fly, walk}, vs. feedback {3*, 4*, 5*}. Theorem 2 The CA mechanism is strong truthful in a categorical domain. Strong truthful: truthful behavior strictly higher expected payment than any other strategy profile (except permutations).
Introduction
Model
Main Results
Empirical study
Discussion
Special case: Categorical Domains
+ − − 1 0 0 sgn(Δ) = − + − ; S = 0 1 0 − − + 0 0 1
Consider an image-labeling task {swim, fly, walk}, vs. feedback {3*, 4*, 5*}. Theorem 2 The CA mechanism is strong truthful in a categorical domain. Strong truthful: truthful behavior strictly higher expected payment than any other strategy profile (except permutations).
Introduction
Model
Main Results
Empirical study
Discussion
Special case: Categorical Domains
+ − − 1 0 0 sgn(Δ) = − + − ; S = 0 1 0 − − + 0 0 1
Consider an image-labeling task {swim, fly, walk}, vs. feedback {3*, 4*, 5*}. Theorem 2 The CA mechanism is strong truthful in a categorical domain. Strong truthful: truthful behavior strictly higher expected payment than any other strategy profile (except permutations).
Introduction
Model
Main Results
Empirical study
Discussion
Strong-truthfulness in General Domains We can show: Impossible to achieve strong-truthfulness on additional signal distributions while just using signal corr. structure. There are symmetric signal distributions for which no multi-task mechanism is strongly truthful.
Introduction
Model
Main Results
Empirical study
A Detail-Free Mechanism
CA-DF mechanism: 1
Estimate the correlation structure from reports on k tasks.
2
Use this to define score matrix S.
Idea: the “truthful score matrix” maximizes expected payment. Theorem 3 (Informal) For O(m3 log(1/ δ)/ ε2 ) tasks, with prob. at least 1 − δ: 1
No strategy profile is more than ε better than truth.
2
Any uninformed strategy is worse than truth.
Discussion
Introduction
Model
Main Results
Empirical study
A Detail-Free Mechanism
CA-DF mechanism: 1
Estimate the correlation structure from reports on k tasks.
2
Use this to define score matrix S.
Idea: the “truthful score matrix” maximizes expected payment. Theorem 3 (Informal) For O(m3 log(1/ δ)/ ε2 ) tasks, with prob. at least 1 − δ: 1
No strategy profile is more than ε better than truth.
2
Any uninformed strategy is worse than truth.
Discussion
Introduction
Model
Main Results
Empirical study
A Detail-Free Mechanism
CA-DF mechanism: 1
Estimate the correlation structure from reports on k tasks.
2
Use this to define score matrix S.
Idea: the “truthful score matrix” maximizes expected payment. Theorem 3 (Informal) For O(m3 log(1/ δ)/ ε2 ) tasks, with prob. at least 1 − δ: 1
No strategy profile is more than ε better than truth.
2
Any uninformed strategy is worse than truth.
Discussion
Introduction
Model
Main Results
Empirical study
Discussion
Peer-Assessment Domains 325,000 peer assessment responses to ∼100 questions across ∼30 exercises in 17 MOOCs Vast majority of questions have m ∈ {2, 3, 4}. Example rubric element: “Not much of a style at all”, “Communicative style”, and “Strong, flowing writing style”.
Pos. corr. Not pos. corr.
60 40
40
20
20
0
2
3
4
5
Categorical Not categorical
60
0
2
3
4
5
Introduction
Model
Main Results
Empirical study
Peer-Assessment Domains (cont.) Correlation structure: 2
Average ∆ matrices 3
4
5
0.10 0.05 0.00 0.05 0.10
2/3 of the worlds are not categorical. Rather, a response ‘3’ tends to be positively correlated with a ‘2.’
Discussion
Introduction
Model
Main Results
Empirical study
Summary: Multi-Task Peer Prediction A way to elicit information from participants without any external inputs. Promote differences of opinion, not ‘group think.’ The CA mechanism is informed truthful: truthful reporting is the best possible, and anything uninformed is strictly worse Maximally strong truthful! Detail-Free CA mechanism can estimate statistics.
Discussion
Introduction
Model
Main Results
Empirical study
Summary: Multi-Task Peer Prediction A way to elicit information from participants without any external inputs. Promote differences of opinion, not ‘group think.’ The CA mechanism is informed truthful: truthful reporting is the best possible, and anything uninformed is strictly worse Maximally strong truthful! Detail-Free CA mechanism can estimate statistics.
Discussion
Introduction
Model
Main Results
Empirical study
Summary: Multi-Task Peer Prediction A way to elicit information from participants without any external inputs. Promote differences of opinion, not ‘group think.’ The CA mechanism is informed truthful: truthful reporting is the best possible, and anything uninformed is strictly worse Maximally strong truthful! Detail-Free CA mechanism can estimate statistics.
Discussion
Introduction
Model
Main Results
Empirical study
Summary: Multi-Task Peer Prediction A way to elicit information from participants without any external inputs. Promote differences of opinion, not ‘group think.’ The CA mechanism is informed truthful: truthful reporting is the best possible, and anything uninformed is strictly worse Maximally strong truthful! Detail-Free CA mechanism can estimate statistics.
Discussion
Introduction
Model
Main Results
Empirical study
Summary: Multi-Task Peer Prediction A way to elicit information from participants without any external inputs. Promote differences of opinion, not ‘group think.’ The CA mechanism is informed truthful: truthful reporting is the best possible, and anything uninformed is strictly worse Maximally strong truthful! Detail-Free CA mechanism can estimate statistics.
Discussion
Introduction
Model
Main Results
Empirical study
Discussion
Future Work Extend to non-binary effort models. Combine with label-aggregation (Dawid-Skene 1979) models, allow for heterogeneity amongst agents. Handle large signal spaces, endogenous task selection. Large-scale validation.
Introduction
Model
Main Results
Empirical study
Discussion
Future Work Extend to non-binary effort models. Combine with label-aggregation (Dawid-Skene 1979) models, allow for heterogeneity amongst agents. Handle large signal spaces, endogenous task selection. Large-scale validation.
Introduction
Model
Main Results
Empirical study
Discussion
Future Work Extend to non-binary effort models. Combine with label-aggregation (Dawid-Skene 1979) models, allow for heterogeneity amongst agents. Handle large signal spaces, endogenous task selection. Large-scale validation.
Introduction
Model
Main Results
Empirical study
Discussion
Future Work Extend to non-binary effort models. Combine with label-aggregation (Dawid-Skene 1979) models, allow for heterogeneity amongst agents. Handle large signal spaces, endogenous task selection. Large-scale validation.
Introduction
Model
Main Results
Thank you
Empirical study
Discussion