A pragmatic characterisation of linear pooling January 8, 2018 Abstract How we should determine a group’s collective probabilistic judgments, given the probabilistic judgments of the individuals in the group? A standard answer is given by this condition: The group probability distribution over the propositions should be a weighted average of the individual probability distributions. Call this Linear Pooling. We provide a condition on aggregates that characterises linear pooling: Given a utility function shared by all members of the group, if each individual in the group expects one act to have greater utility than another, then the group expects the first act to have greater utility than the second. Call this Pareto. We prove that Linear Pooling and Pareto are equivalent.

In judgment aggregation, we take a group of individuals and their judgments concerning a range of different propositions, and we produce a single set of judgments about those propositions that we take to be the group’s collective judgments. The individuals might be climate scientists, for instance, and the propositions might concern sea level rise by 2100; or they might be members of the board of a company, and the propositions might concern the future market performance of some other company they wish to acquire. In this paper, we’ll look particularly at probabilistic judgments — those we represent by an agent’s credences or degrees of belief. That is, each individual assigns subjective probabilities to each of the propositions in question. And we ask: how should we determine the group’s collective probabilities for those propositions? We won’t give a definitive answer here; rather, we will present a consideration that tells in favour of one popular constraint that we might place on the group probabilities. The constraint is this: the group probability distribution over the propositions should be a weighted average or mixture or convex combination of the individual probability distributions. More formally, the condition is this: Linear Pooling Suppose p0 , p1 , . . . , pn are probability functions defined on the same algebra. Then, if p0 is the aggregate of p1 , . . . , pn , then there are non-negative weights α1 , . . . , αn with ∑in=1 αi = 1 such that p0 (−) = α1 p1 (−) + . . . + αn pn (−) 1

We will show that, if the aggregate probability function is not a weighted average of the probability functions to be aggregated, then there is a books of bets, each of which p0 will consider favourable, but which together each of the pi s will considerable unfavourable. That is, there is a binary decision — accept the bets or reject them — such that each pi prefers the second option to the first, while the aggregate p0 prefers the first to the second.

1

Existing characterisations of weighted averaging

Before we make all of this precise, it is worth saying how our consideration in favour of Linear Pooling differs in structure from standard considerations. Usually, in the literature on judgment aggregation — probabilistic and not — we assume that we produce aggregates using a particular rule or method. For instance, in the probabilistic case, if Ω is the set of possible worlds, P (Ω) is the powerset algebra over Ω, and ProbΩ is the set of all probability functions defined on P (Ω), we assume that there is a function τ : ProbΩ × . . . × ProbΩ → ProbΩ , such that, if p1 , . . . , pn are the individ| {z } n

ual probability functions, then τ ( p1 , . . . , pn ) is the aggregate. Carl Wagner (1984) calls such a function a probability aggregation method. We then place conditions not directly on the aggregate itself, but on the probability aggregation method that produces it. For instance, we might consider the following two conditions: Irrelevance of Alternatives Suppose p1 , . . . , pn and p10 , . . . , p0n are two different sets of individual probability functions defined on P (Ω). And suppose X ⊆ Ω. Then if pi ( X ) = pi0 ( X ) for all 1 ≤ i ≤ n, then τ ( p1 , . . . , pn )( X ) = τ ( p10 , . . . , p0n )( X ). Irrelevance of Alternatives is a dependence condition. That is, it tells us that some feature of the aggregate depends only on some features of the individuals. In this case, the aggregate probability in a proposition depends only on the individual probabilities in that proposition, and not on the individual probabilities in any other proposition. So the latter can change in any possible way, but if the former remain the same, then the aggregate probability in the proposition remains the same. Zero Preservation Suppose p1 , . . . , pn is a set of probability functions defined on P (Ω). And suppose X ⊆ Ω. Then if pi ( X ) = 0 for all 1 ≤ i ≤ n, then τ ( p1 , . . . , pn )( X ) = 0. Zero Unanimity is a unanimity preservation principle. That is, it tells us that if all individual probability functions share a particular sort of property, then the aggregate should share that property too. In this case, the property

2

is assigning zero to a particular proposition. So, if all individuals assign probability zero to that proposition, so should the aggregate. Carl Wagner (1982) proved that there is only one sort of probability aggregation method that satisfies both Irrelevance of Alternatives and Zero Preservation: Theorem 1 (Wagner) If τ satisfies Irrelevance of Alternatives and Zero Preservation, then there are non-negative weights α1 , . . . , αn with ∑in=1 αi = 1 such that, for any p1 , . . . , pn in ProbΩ , τ ( p1 , . . . , pn )(−) = α1 p1 (−) + . . . + αn pn (−) That is, the only probability aggregation methods that satisfy Wagner’s two conditions are those that satisfy not only Linear Pooling — the conclusion to which I will offer support below, which says that the aggregate of individual probability functions should be a weighted average of them — but also that we should use the same weights for every different set of individual probability functions. When we assume that we pick our aggregates using a probability aggregation method, we make an assumption that Russell et al. (2015) call Functionality. Note that it is quite strong. It is analogous to the Universal Domain condition in Arrow’s Theorem, since it requires that we can define an aggregate for every possible combination of probability functions (Arrow, 1951). But perhaps there are some combinations for which this is simply not possible — perhaps the individuals disagree so significantly that it is not possible to find a probability function that might serve as a meaningful aggregate of them (Bright et al., 2017, Section 5). Note also that it is essential for the sort of characterization that Wagner gives, and which has dominated the literature on judgment aggregation. For it is not possible to state dependence conditions like Irrelevance of Alternatives without this assumption. Unlike unanimity preservation principles, which say only that the aggregate should have a property if all individuals have it, and where the conditional here might be read as a material rather than subjunctive conditional, dependence conditions are irreducibly modal conditions — they say something not only about the aggregate of the individual probability functions that you actually wish to combine, but also about the aggregates of other possible sets of individual probability functions. And nearly all existing characterisations of weighted average aggregates rely on such dependence conditions and their modal properties — an exception is (Pettigrew, ta). My aim here is to give a characterisation of weighted average aggregates that does not rely on Functionality (i.e. Universal Domain), nor on any conditions on what aggregate you would have produced had the individuals you are aggregating had different probability functions.

3

2

A new characterisation of weighted averaging

We will present our condition in two equivalent ways: first, we show that an aggregate satisfies Linear Pooling iff it is not vulnerable to a certain sort of expected sure loss; second, we show that Linear Pooling is equivalent to a particular unanimity preservation principle — in fact, a Pareto condition — which does not require Functionality for its formulation. The most well-known sure loss or Dutch Book argument is the one that aims to establish Probabilism, the claim that your credences or degrees of belief should obey the Kolmogorov probability axioms. Let’s quickly run through this argument to remind ourselves how it works. It is based on two assumptions concerning bets. Formally, a bet B is a random variable, where B(ω ) is the price (in pounds, say) that the agent who takes the bet will receive in world ω. We will be particularly considered with bets BX,S , where X ⊆ Ω and S ∈ R and  S if ω ∈ X BX,S (ω ) = 0 if ω 6∈ X Thus, at worlds at which X is true, BX,S pays out £S, so that if S is positive, the agent receives £|S|, whereas if S is negative, then the agent gives out £|S|; at worlds at which X is false, BX,S pays out £0. We call S the stake of bet BX,S . If we let Iω be the indicator function of world ω, so that, for any X ⊆ Ω, Iω ( X ) = 1 if ω ∈ X and Iω ( X ) = 0 if ω 6∈ X, then BX,S (ω ) = Iω ( X )SX . Fair price An agent with credence p in X will consider £pS to be a fair price for bet BX,S — that is, she will be willing to buy or sell BX,S for £pS. Package Principle Suppose an agent considers £z a fair price for bet B and £z0 a fair price for bet B0 . Then she considers £(z + z0 ) to be a fair price for the bet B + B0 , where ( B + B0 )(ω ) = B ( ω ) + B 0 ( ω ). It is then possible to show that, if c is not a probability function, then there is a book of bets, one for each proposition, such that the fair price that an agent with c is prepared to pay for that book is greater than the maximum payout of the book; if c is a probability function, there can be no such book of bets. This is the so-called Dutch Book and Converse Dutch Book Theorem. Let’s state this more precisely. Suppose P (Ω) = { X1 , . . . , Xm }, and c is a credence function defined on P (Ω). And suppose hS1 , . . . , Sm i is a set of stakes, one for each proposition; and ∑m j=1 BX j ,S j is the book of bets with those stakes. Then: 4

(i) If ω ∈ Ω, then the payoff of this book at ω is m

m

∑ BX ,S (ω ) = ∑ Iω (Xj )Sj j

j

j =1

j =1

(ii) If c is a credence function on P (Ω), then the price that an agent with c would consider fair for this book is m

∑ c( X j )S j

j =1

Theorem 2 (Ramsey, de Finetti) (I) Suppose c is not probability function. Then there is a set of stakes, hS1 , . . . , Sm i, such that, for all worlds ω, m



m

Iω ( X j )S j <

j =1

∑ c( X j )S j

j =1

(II) Suppose c is a probability function. Then there is no such set of stakes. We offer a proof in the Appendix. Now, suppose p0 , p1 , . . . , pn are probability functions over P (Ω). And suppose that there are no non-negative weights α1 , . . . , αn with ∑in=1 αi = 1 such that p0 = α1 p1 + . . . + pn . Then we cannot show that using p0 to set fair prices will result in sure loss — p0 is a probability function and so the converse half of the Dutch Book theorem shows that isn’t the case. But we can show something weaker. We can show that there is a book of bets such that the price that p0 will consider fair is greater than the expected payoff of the bet from the point of view of each of the individual probability functions pi , for 1 ≤ i ≤ n. That is: Theorem 3 (I) Suppose p0 is not a weighted average of p1 , . . . , pn . Then there is a set of stakes, hS1 , . . . , Sm i, such that, for 1 ≤ i ≤ n, ! m

m

Exp pi

∑ BXj ,Sj

j =1

<

∑ c( X j )S j

j =1

(II) Suppose p0 is a weighted average of p1 , . . . , pn . Then there is no such set of stakes.

5

Now, there are a number of problems with Dutch Book arguments. First, the Fair Price principle assumes that an agent’s utility is linear in money — thus, for instance, it assumes that the difference in the utilities of £5 and £10 is the same as the difference in the utilities of £1000 and £1005. But that is known to be false — agents often consider money to have diminishing marginal utility, so that the latter difference is less than the former difference. Second, the Package Principle has been questioned — I might consider a price for one bet fair and a price for another fair, but when I see that the sum of their prices is less than the maximum payout of the book of the two of them together, I don’t consider it fair. So it would be preferable to present this new argument for Linear Pooling in a way that is not vulnerable to either of these concerns. We do this now. We replace Fair Price and the Package Principle with two assumptions: Shared Utilities All individuals in the group have the same utility function, u. Given an act A and a world ω, u( A, ω ) is the utility of the outcome of act A at world ω. Given two acts A and B, we write A i B iff the expected utility of A by the lights of pi is at most the expected utility of B by the lights of pi . That is, A i B iff Exp pi u( A) ≤ Exp pi u( B). (Of course, it makes no sense to talk of an agent having a single utility function. Instead, what I mean here and in what follows is that each agent’s utilities can be represented by u or any positive affine transformation of u.) Act Plenitude For any sequence r = hrω iω ∈Ω of real numbers, one for each world ω ∈ Ω, there is an action Ar such that u( Ar , ω ) = rω , for each ω. Now we can prove the following: Theorem 4 (I) Suppose p0 is not a weighted average of p1 , . . . , pn . Then there are two acts A and B such that (a) A ≺i B, for all 1 ≤ i ≤ n (b) A 0 B (II) Suppose p0 is a weighted average of p1 , . . . , pn . Then there are no two such acts. That is, Linear Pooling is equivalent to the following unanimity preservation principle:

6

Pareto Suppose p0 , p1 , . . . , pn are probability functions defined on P (Ω), u is a utility function, and i is the expected utility ordering of acts relative to pi and u. Then, if p0 is the aggregate of p1 , . . . , pn , then:

(∀1 ≤ i ≤ n) A ≺i B =⇒ A ≺0 B. Note: Pareto is not a modal condition — it concerns only the aggregate of the probability functions p1 , . . . , pn , and places no constraints over the aggregates of any other sets of probability functions. Thus, Pareto does not presuppose Functionality. Thus, we have a characterisation of Linear Pooling that does not presuppose Functionality. It is also worth noting a related unanimity preservation principle that Linear Pooling does not satisfy: Pareto∗ Suppose p0 , p1 , . . . , pn are probability functions defined on P (Ω), u is a utility function, and i is the expected utility ordering of acts relative to pi and u. Then, if p0 is the aggregate of p1 , . . . , pn , then:

(∀1 ≤ i ≤ n)(∃ Ai ) Ai ≺i B =⇒ (∃ A0 ) A0 ≺0 B To see that Linear Pooling violates this, consider the following decision, inspired by the Miners Paradox (Parfit, ms): A B C

ω1 10 6 0

ω2 0 6 10

Now suppose p1 and p2 are two individual probabilities over {ω1 , ω2 } and p0 is the straight average of the two of them, so that p0 = 12 p1 + 21 p2 . p1 p2 p0

ω1 1 0

ω2 0 1

1 2

1 2

B B C

≺1 ≺2 ≺0

But: C A A

≺1 ≺2 ∼0

So they violate Pareto∗ .

7

A C B

3

Conclusion

We have presented a novel characterisation of Linear Pooling that does not presuppose Functionality. An aggregate probability function satisfies Linear Pooling — that is, it is a weighted average of the individual probability function — if, and only if, it satisfies Pareto — that is, if it prefers one act to another whenever all the individuals do. Of course, while Pareto is plausible, there are other conditions on aggregates that are also plausible but which Linear Pooling does not satisfy. For instance, aggregates that satisfy Linear Pooling almost never preserve unanimous judgments of independence amongst the individuals (Laddaga, 1977; Elkin & Wheeler, 2016). Thus, our present characterisation does not provide a compelling argument for Linear Pooling unless it is coupled with an argument that such independence preservation is hardly ever more desirable than satisfying Pareto. We leave such an argument for another time.

4

Proofs

In this section, we prove the mathematical results, Theorems 2, 3, and 4. First, as above, suppose P (Ω) = { X1 , . . . , Xm }. Now: • We represent a credence function c defined on P (Ω) as a vector in Rm , namely, c = hc( X1 ), . . . , c( Xm )i. m • We represent the book of bets ∑m j=1 BXi ,Si as a vector in R , namely, S = h S1 , . . . , S m i .

Lemma 5 (i) If ω ∈ Ω, then the payoff of the book of bets ∑m j=1 BXi ,Si at ω is m

S · Iω =



m

Iω ( X j )S j =

j =1

∑ BX ,S (ω ) j

j

j =1

(ii) An agent with credence function c will consider c( X j )S j the fair price for a bet on X j with stake S j (Fair Price), and thus will consider m

S·c =

∑ c( X j )S j

j =1

the fair price for the book of bets ∑m j=1 BXi ,Si with stakes S = h S1 , . . . , Sm i (Package Principle).

8

Lemma 6 The set of probability functions over P (Ω) is the convex hull of the set of characteristic functions of worlds in Ω. That is, a credence function c defined on P (Ω) is a probability function iff there are non-negative weights hαω iω ∈Ω with ∑ω ∈Ω αω = 1 such that c(−) =



αω Iω (−)

ω ∈Ω

That is, ProbΩ = { Iω : ω ∈ Ω}+ . Proof of Lemma 6 • First, we show that ProbΩ ⊆ { Iω : ω ∈ Ω}+ . Suppose p is a probability function. Then let αω = p(ω ). Then p( X ) = ∑ω ∈X p(ω ) = ∑ω ∈X αω = ∑ω ∈Ω αω Iω ( X ). • Second, we show that { Iω : ω ∈ Ω}+ ⊆ ProbΩ . First, note that each Iω is a probability function: (i) Iω (Ω) = 1; (ii) Iω ( X ∨ Y ) = Iω ( X ) + Iω (Y ) − Iω ( XY ), for all X, Y ⊆ Ω. Second, note that, if p and q are probability functions on P (Ω), then so is any mixture of them, αp + (1 − α)q, where 0 ≤ α ≤ 1. 2

That completes the proof of Lemma 6.

Lemma 7 Suppose c is a credence function on P (Ω) and X is a finite set of credence functions on P (Ω). Then, if c 6∈ X , then there is a vector S such that, for all x in X , S·x < S·c Proof of Lemma 8. Suppose c 6∈ X + . Then let c∗ be the closest point in X + to c. Then let S = c − c∗ . Then, for any x in X , the angle θ between S and x − c is obtuse and thus cos θ < 0 (see Figure 1 below). So, since S · ( x − c) = ||S|| || x − c||cos θ and ||S||, || x − c|| > 0, we have S · ( x − c) < 0. And hence S · x < S · c. 2 Lemma 8 Suppose p is a probability function on P (Ω). Then the expected payoff of the book of bets ∑m j=1 BXi ,Si with stakes S = h S1 , . . . , Sm i by the lights of p is ! m

Exp p

∑ BX ,S i

i

=



p(ω )(S · Iω )

ω ∈Ω

j =1

=



m

p(ω )

ω ∈Ω m

=

∑ ∑ p(ω ) Iω (Xj )Sj

ω ∈ Ω j =1 m

=

∑ Sj Iω (Xj )

j =1

∑ p( X j )S j

j =1

= S·p 9

!

Theorem 2(I) follows from Lemmas 5, 6 and 7 when we let X = { Iω : ω ∈ Ω}. Theorem 2(II) holds because, when c is a probability function, the fair price she will pay for a bet is equal to her expectation of the payout of that bet. Thus, the payout cannot be less than the fair price at every world, for then the expectation of the former would be less than the latter, not equal to it. 2 Theorem 3(I) follows from Lemmas 5, 7 and 8 when we let c = p0 and X = { p1 , . . . , pn }. Theorem 3(II) holds because, when p0 is a weighted average of p1 , . . . , pn , its expectation of the payout of a bet — and thus its fair price for a bet — is a weighted average of the expections of the payout by the lights of p1 , . . . , pn — and thus their fair prices for the bet. Thus, expected payout of the bet by the lights of pi cannot be less than p0 ’s fair price, for each 1 ≤ i ≤ n, for then the weighted average of the former would be less than the latter, not equal to it. 2 Theorem 4(I) follows from Lemma 7 when we let c = p0 and X = { p1 , . . . , pn }, and we define the acts A and B as follows: • u( A, ω ) = S · p0 − S · Iω + ε • u( B, ω ) = 0 where ε > 0 is specified so that S · p0 − S · pi + ε < 0 for all 1 ≤ i ≤ n. This is possible since, by Lemma 7, S · p0 − S · pi < 0 for all 1 ≤ i ≤ n. After all: Exp p u( A) =



p(ω ) (S · p0 − S · Iω + ε)

ω ∈Ω

= S · p0 −



p(ω )(S · Iω ) + ε

ω ∈Ω

= S · p0 − S · p + ε So: • Exp p0 u( A) = S · p0 − S · p0 + ε = ε > 0 = Exp p0 u( B) • Exp pi u( A) = S · p0 − S · pi + ε < 0 = Exp pi u( B). Theorem 4(II) holds because, when p0 is a weighted average of p1 , . . . , pn , its expectation of the utility of an act is a weighted average of the expectation of that utility by the lights of p1 , . . . , pn . Thus, if A ≺i B, for each 1 ≤ i ≤ n, then Exp pi u( A) < Exp pi u( B), for each 1 ≤ i ≤ n, and thus, n

Exp p0 u( A) =

∑ αi Exp pi u( A) <

i =1

n

∑ αi Exp p u( B) = Exp p u( B).

i =1

i

0

2

That completes the proof. 10

S

c

X

θ

S = c − c∗ x−c c∗

x Figure 1

11

References Arrow, K. J. (1951). Social Choice and Individual Values. New York: Wiley. Bright, L. K., Dang, H., & Heesen, R. (2017). A Role for Judgment Aggregation in Coauthoring Scientific Papers. Erkenntnis. Elkin, L., & Wheeler, G. (2016). Resolving Peer Disagreements Through Imprecise Probabilities. Nous, ˆ doi: 10.1111/nous.12143. Laddaga, R. (1977). Lehrer and the consensus proposal. Synthese, 36, 473– 77. Parfit, D. (ms). What We Together Do. Pettigrew, R. (ta). On the Accuracy of Group Credences. In T. S. Gendler, & J. Hawthorne (Eds.) Oxford Studies in Epistemology, vol. 6. Oxford: Oxford University Press. Russell, J. S., Hawthorne, J., & Buchak, L. (2015). Groupthink. Philosophical Studies, 172, 1287–1309. Wagner, C. (1982). Allocation, Lehrer models, and the consensus of probabilities. Theory and Decision, 14, 207–220. Wagner, C. (1984). Aggregating subjective probabilities: some limitative theorems. Notre Dame Journal of Formal Logic, 25(3), 233–240.

12

db-arg-lp.pdf

Page 1 of 12. A pragmatic characterisation of linear pooling. January 8, 2018. Abstract. How we should determine a group's collective probabilistic judg- ments, given the probabilistic judgments of the individuals in the. group? A standard answer is given by this condition: The group. probability distribution over the ...

145KB Sizes 8 Downloads 400 Views

Recommend Documents

No documents