Scoring Rules and the Inevitability of Probability Author(s)

Viewer
Transcript

Scoring Rules and the Inevitability of Probability Author(s): Dennis V. Lindley Source: International Statistical Review / Revue Internationale de Statistique, Vol. 50, No. 1 (Apr., 1982), pp. 1-11 Published by: International Statistical Institute (ISI) Stable URL: http://www.jstor.org/stable/1402448 Accessed: 26/08/2009 00:08 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=isi. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access to International Statistical Review / Revue Internationale de Statistique.

http://www.jstor.org

InternationalStatisticalReview50 (1982).pp. 1-26 ?InternationalStatisticalInstitute

Rules Scoring Probability

and

in GreatBritain LongmanGroupLimited/Printed

the

Inevitability

of

Dennis V. Lindley 2 Periton Lane, Minehead, Somerset, TA24 8AQ, England

Summary Let a person express his uncertaintyabout an event E, conditional upon an event F, by a numberx and let him be given, as a result, a score which depends on x and the truth or falsity of E when F is true. It is shown that if the scores are additive for differentevents and if the person chooses admissible values only, then there exists a known transform of the values x to values which are probabilities. In particular, it follows that values x derived by significance tests, confidence intervals or by the rules of fuzzy logic are inadmissible.Only probabilityis a sensible descriptionof uncertainty. Key words:Admissibility;Confidencestatements;Expectation;Finite additivity;Fuzzy logic; Pareto optimality; Possibilities;Properrules;Significance tests;Upperandlowerprobabilities.

1

Introduction

Suppose that a person, considering an event E about which he is uncertain, describes that uncertainty by a number x. De Finetti's (1974, Ch. 4) basic argument is that if the person is scored an amount (x - 1)2 if E is true and x2 if E is false, and if the scores for different events are additive, then x must be a probability for E. This result has been generalized to some other scores

besides the quadratic one: a seminal paper is that by Savage (1971) which contains several references. In the present paper we show that De Finetti's argument applies to virtually every reasonable score function with the only modificationthat a known transformof x, ratherthan x itself, must be a probability. The argument may be viewed as providing another axiomatic

justification for the Bayesian position, advantages being the simplicity both of the assumptions and of the proof. It also demonstratesthat any descriptionof uncertaintyby numbersthat do not obey the rules of the probability calculus, even after transformation, will violate the simple assumptions we make. Examples of such nonprobabilistic assignments are significance levels, confidence statements and possibilities in fuzzy logic. The argument is extended to where the description is by means of two numbers, perhaps upper and lower probabilities, as suggested by Dempster (1968) and Smith (1961), to demonstrate that these are in disagreement with the assumptions. The message is essentially that only probabilistic descriptions of uncertainty are reasonable.

2

Assumptions

Notation. We consider real variables X, Y, ... taking values x, y,.... Events are denoted by E, F, ... and the same symbol is used for the indicator variable of an event, so that E = 1 (O) if E is true

2

D. V. LINDLEY

(false). Iff(X,E) is a function of the variablesX and E,f(x, 1) is the value of that function when X takes the value x and E is true;f' (X, E) is the derivativeof that function with respect to X. Score assumption. For a given scorefunction f(X, E), a person who describeshis uncertainty about E, conditional on F, by a real number x will receive a score f(x, E)F. The scores are additive in that if xi refers to Ei conditional on Fi for i = 1,..., n, then the total score for all these descriptionswill be if (xi, Ei)Fi, where the sum is over i = 1,.... n. We consider the question of what are reasonable values for him to choose. A score may be thought of as a reward or as a penalty. For definitenesswe shall think of it as a penalty, so that the person wishes to reduce his score. Admissibility assumption. A person will not choose values xi for Ei conditional on F1 (i = 1, ..., n) if there exist values y1, ..., y such that if (yi, Ei)Fi <

if (xi, Ei)Fi

(1)

for all values of the indicatorvariables,and strict inequalityholds for some values. If the conditions do obtain, he could reduce his penalty in some circumstances without increasingit in any. In statisticallanguagethe set (xl, ..., x,) is inadmissibleand the assumption says that only admissible values will be selected. Origin and scale assumption. For the uncertainty of E (E) conditional on E, there exists a unique admissible value XT (XF) the same for all E; and XF:

xT.

The suffix T (F) denotes true (false). Without loss of generalitywe suppose XF < XT. Regularity assumptions. VariableX can assume all values in a closed intervalI of the real line. Derivativef' (X,E) exists, is continuous in X for each E and, for both E = 1 and E = 0, vanishes at most once. Also XEand xT are interiorpoints of L These regularityassumptions are unnecessarilyrestrictiveand are later relaxed.Our reason for introducingthem in this form is that the proof is then unencumberedwith side-issues that might otherwise obscure the argument.We first prove three lemmas. 3

Lemmas

All values in the closed interval inadmissible. Thefunction

LEMMA 1.

P(x)=

are admissible, and values outside are

[XF,xT]

0 f' (x, 0)

f'(x,0)-f'

(x, 1)

(2)

in [xF,XTI, is continuous and P(xF) = 0, P(xT) = 1. In particular the satisfies 0
unique admissible value and therefore minimizes this function. By the regularity assumption f' (XT, 1) = 0. Similarlyfor E conditional on E,f' (xF, 0) = 0. Again by the regularityassumption f'(x, 1) > (<) 0 for x > (<) XT andf'(x, 0) > (<) 0 for x > (<) X : in particular,f'(xT, 0) > 0 andf' (x, 1) < 0. For E conditional on F the score will be f(x,

1) if EF = 1 and f(x, 0) if (1 - E)F = 1, and

otherwise zero. Iff' (x, 1) andf' (x, 0) are both strictly positive (negative)x is inadmissiblesince a small decrease (increase)in x will reduce both scores. Combiningthis with the result in the final sentence of the last paragraph,we see that only values in [xF,xT] are admissible. All values in [XF,XT] are admissible since any decrease from x, although it will lowerf(x,0), will necessarily increasef(x, 1): and similarlyfor an increasefrom x. The propertiesclaimedfor P(x) all easily follow from the results alreadyestablished.

Scoring rules and the inevitabilityofprobability

3

LEMMA 2. The values x for E and y for E, both conditional on F, being admissible imply P(x) + P(y)= 1.

The total scores in the two possible cases will be: EF= (1 -E)F=1

f(x, 1) +f(y, ), f(x, O) +f(y,).

(If other assessments have been made the appropriate scores need to be added to these expressions but it is easily seen that this will not affect the argument that follows, so that the assessments for E and E can be consideredin isolation.) Consider small changes in x to (x+ h) and y to (y+ k). The resulting changes in these scores will be, to order h and k, f' (x, 1)h +f' (y, O)k, f' (x, O)h +f' (y, 1)k. Both these changes could be made negative, so reducing both scores and making (x,y) inadmissible, by solving the linear equations in h and k obtained by equating these to small, selected, negative values. The only exception to this occurs when the determinantof the linear equations vanishes. The condition for this is that f' (x, 1)f (y, 1) =f' (x, O)f' (y, 0) or P(x) + P()= 1. This argument fails at boundary points because the values of h or k requiredto reduce both scores may not be permissible.Consider the case x=XT, where h<0 and f'(x, 1)=0. Ify#xF, so thatf' (y, 0)> 0, the first change isf' (y, O)k and for this to be negative,k <0. Sincef' (x,0)> 0, the second change can be made negative. Hence x=XT is inadmissible unless y=xF when the first change is necessarily positive and P(XT)+P(XF)=1. Otherboundaryvalues follow similarly. LEMMA 3. The values, x for F conditional on G, y for E conditional on FG, and z for EF conditional on G, being admissible impliesP(z) = P(x)P(y).

The method of proof follows that of Lemma 2. The total scores in the three possible cases will be: EFG = 1 f(x, 1) + f(y, 1) +f(z, 1), (1 - E)FG = 1 f(x, 1) +f(y, 0) +f(z, 0), + f (z, 0). (1-F)G = 1 f(x, O) Consider small changes in x, y and z; then these can result in changes in the three total scores that are all negative, so making (x,y,z) inadmissible, unless the determinant of the linear equations is zero. Simple calculation establishesthat the determinantis [f' (x, 0) -f'

(x, 1) [f' (y, 0) -f'

(, 1) ] [f' (z, 0) -f'

(z, 1) ] [P(x)P(y) - P(z) ].

The first three factors do not vanish by results established in the proof of Lemma 1. Hence the last factor vanishes, as required.The boundary values requirespecial considerationas in Lemma 2; details are omitted.

4

Main results

Thefour assumptions listed above imply that the values x describing uncertainty will be such that the transformsP(x) obey the laws offinitely-additiveprobability.

THEOREM 1.

4

D. V. LINDLEY

Lemma 1 establishes the convexity propertythat 0 < P(x) < 1 and P(xF) = 0. Lemma 2 is the additiveproperty.Lemma 3 is the multiplicativeproperty. The theorem states that admissibilityimplies probability,through a transform of the stated value, but not the converse. To consider this, suppose that a person has probabilityp for E and considers that the relevantquantityis his expected score pf(x, 1) + (1-p)f(x,

0).

He will minimize this over x with the result that p = P(x), in accord with the theorem. The same argument applies in the circumstancesof Lemmas 2 and 3. Minimizationof expectation gives an admissibleresult, so that we can state the Corollary. COROLLARY. If the equation in x, P(x) =p, has a unique rootfor all 0

If P(x) = p has a unique root, we shall refer to the scoring rule as single-valued. If it has multiple roots we have the possibilities of probability rules giving inadmissible values, or of admissible values not obtained through minimizationof an expectation. (Examples below show that both possibilitiescan occur.) Our next resultenhances the status of the probabilitytransform ofx. LEMMA4. If, in consideringthe uncertaintyofE conditionalon F with scorefunctionf (X, E) F, a person gives x; and with scorefunction g(Y,E)F, gives y: then P(x) = Q(y). Here Q(y) = g'(y, 0)/g' (y, 0) - g' (y, 1) } see (2), and the result says that if the score function is changed the probabilitytransform is invariant.The proof uses the method of Lemmas 2 and 3. The first-orderchanges will be

EF= 1 f' (x, I)h + g' (y,1)k (1 -E)F= 1 f' (x,O)h +g' (y, O)k and the determinantnecessarily being zero give P(x) = Q(y). It follows that a person could proceed by choosing his probabilityp in advance of knowing what score function was to be used and then, when it was announced, providing x satisfying P (x) = p. Robert Nau has pointed out to me that in the proof of the theoremthere is no need for the score function to be the same for each event considered:each value can be transformedby its own probabilitytransformto give a probabilityvalue. The next result shows that any probability transformis possible. LEMMA 5. For any function P(x) having the properties described in Lemma 1, there exists a scorefunction with P(x) as probabilitytransform.

For example, letf(x, 0) = (x - XF)2,the quadraticfunction. Then from (2) f (x, 1) = 2(x-XF) [P(x) -

]/P(x),

which, with the boundary conditionf' (x, 1) = 0, yields a solution for f(x, I) satisfying the regularityconditions. In all the discussion so far the only ordering of scores has been based on admissibilityand shows that the probability-transformsinclude all admissiblevalues. But when a person selects a value x to describe his uncertaintyhe is using more than admissibility:he is selecting one value

Scoring rules and the inevitabilityofprobability

5

out of all admissible ones. In particular,in the case of multiple roots, he is selecting amongst x values that yield the same probability.Our next assumption concerns this additionalordering. Invariance assumption. Any preferencesamongst scores do not depend on the score function being used, and such preferencesare transitive.

THEOREM 2. Thefive assumptions listed above imply that the values x describing uncertainty will be such that the transformsP(x) obey the laws offinitely-additiveprobabilityand conversely that any x may be attained by selectingprobabilities and minimizingthe expectedscore.

Only the second part requiresproof (the first is Theorem 1) and we use Fig. 1. Here the axes are the scoresf(x, 1) andf(x, 0) and the solid curve describesthese coordinates as x varies from XF (at F) to XT (at T). (It is actually the curve of a quartic rule to be described below, but will serve for the proof.) This curve will have slopef' (x, O)/f' (x, 1) = - P(x)/ { 1 - P(x) 1, always negative and varying continuously from zero at F to -oo at T. By the remark above, any curve with these properties can be obtained from suitablef's. The points A and B are both admissible and have the same transforms P(x): the tangents at A and B have the same slope. The dotted curve correspondsto another score function which is single-valuedand passes throughA with the same slope. On this curve there clearly exists a point C with both scores less than those of B. Now consider an event with probabilityp. With the single-valuedscore function,A is preferred to C. By admissibilityC is preferredto B. Hence, by the invarianceassumption,A is preferredto

f(x,O) +.5

T

-1.5

-1.0

-.5

I.,\

-1.0

-1.5

Figure 1. f(x, E) =

x4-

x2 + ( -E)x

(-2 < x < 2)

6

D. V. LINDLEY

B with the original score function. This argumentis availablefor any point, like A, that minimizes the expected score, and the theorem is established. (Although this argumentdeals with only one event, it is generalbecause other events will only cause additions to the scores,f(x, 1) andf(x, 0), and will be unaffectedby changes in x. Notice also that the slope of the tangent for admissibleprobabilityvalues is monotone in x, so that P(x) is also monotone for those values.) The additivity assumption can be relaxed. For example, multiplicationcould be used, but on replacingthe score by its logarithmwe are back to addition.It is most emphaticallynot true that additivity produces the addition rule of probability:as a refereeremarked,it is not 'additivityin, additivityout'. The origin and scale assumption ensures both that a person will have no ambiguityin stating his value for an event known to be true (or false) and that he distinguishestruth from falsehood. One can have score functions with severalminima and, in particular,severalpossible descriptions of a sure event. This leads to ambiguitieswhich can be resolved in the sense that they all lead to the same probability,namely one, after transformation.No advantageseems to accrue from such flexibility. The regularity assumption requires considerable discussion. The existence and continuity of the derivatives is introduced in order to avoid abruptchanges in the score. The nonvanishingof the derivatives, except at XF and XT, is a slight strengtheningof the natural requirementthat, at least for admissible values, the score function does not take the same value for two different choices x1 and x2; for if it did, there would be no rationale for choosing between x1 and x2 and again there would be ambiguity.The unnecessarilysevere restrictionis that XF and XT are interior points, introducedto ensure that the minima are obtained by the differentialcalculus, a condition that need not obtain on boundary points. We consider the case where XF is a boundary point: an analogous treatmentapplies at T . The major differencenow is that we do not necessarilyhavef' (xF,O) = 0. Suppose that we add the condition that limf' (x, 0)/f' (x, 1)= 0 xl X

and requirethatf' (x, 0) > 0 for x > XF. The effect of the limit condition is to make lim P(x) = 0 as x XF. It is then straightforwardto show that the propertiesof P(x) proved in Lemma 1 still obtain, as do the boundary features considered in Lemmas 2 and 3. The curve of admissible values used in the proof of Theorem 2 will still have zero slope at XF and the argumentused there carries over. Consequentlyboth theoremsremaintrue. We thereforerestate the regularityassumptionas follows. Regularity Assumption. VariableX can assume all values in a closed intervalI of the real line. Derivativef' (X,E) exists and is continuous in X for each E. For x > (<) xF,f' (x, 0) > (<) 0: for x > (<) xT,f' (x, 1) > (<) 0. Then eitherxFis an interiorpoint of I or limf' (x,0)/f' (x, 1) =0.

X XF

Also eitherXT is interioror limf' (x, 1)/f' (x,0)= 0. xtxr

Under these conditions Theorems 1 and 2 persist. 5

Discussion

Comment 1. Throughout the discussion we have referredto uncertaintyof E conditional on F because conditional assessment is the general form. If the person knows that F is true then we

Scoring rules and the inevitabilityofprobability

7

may speak of the uncertainty of E. It should be rememberedthat the full force of the phrase 'conditional on F' is 'were the person to be told that F is true'. He is assessing the situation now and scores are only nonzero, and therefore, of concern to him, when F = 1. He need only consider the case F = 1 but does not need to know that F = 1. Comment2.

A scoring rule is proper if it leads directly to a probability:that is, if P(x) = x or xf' (x, 1)=(1-x)f'

The quadratic rule used by De Finetti has f(x,

1)=

(x, 0). (x

-

1)2 and f(x, 0) = x2 and is clearly

proper. As an example of an improperrule considerf(x, 1) = (x - 1)4 andf(x, 0) = x4, in which the fourth powers replace the squares of the properrule. Function P(x) is then x3/(3x2 - 3x + 1) and P(x) = p is a cubic in x with a unique root for any 0 < P < 1. The quartic rule, suggested to me by Robert Nau, f(x, E) = 6x4-

2 + (-

E)x

provides an example for which P(x) = p has multipleroots or is not single valued. Here XF= -2, = + 2, the regularityconditions are obeyed with these as interiorpoints and

XT

P(x) =(x

+ 2) (x-1)2/4

a cubic with three roots in x for everyp, 0 < p < 1. It is the scores for a single event with this rule that are graphed in the figure. As x decreases from xT = 2 the scores move from T along the curve to the point a when x = /3. These points lie on a convex part of the curve and can be obtained by minimizingthe expected score. As x decreases furtherthe curve remainsconvex until at x = 1 it reaches the point b; but these points, though admissible, cannot be obtained by a minimization of the expected score and are dominated by points near F(x near xF) having the same tangent slope. Between x = 1 and x = 0, when the curve reaches the origin, the curve is concave but the values are still admissible though again dominated by values near XF. The situation repeats itself between the origin and F with -x for x and T for F. Only values 3 x2 4 are satisfactory and can be obtained by minimizingthe expected score. Between -2 and -V3, P(x) increases monotonically from 0 to 4: between +x/ 3 and +V 2 it similarly passes from 4to 1. The stated value has a discontinuity as p passes through 4. It is generally true that the condition for convexity is P' (x) > 0: this obtains here with 1 x2 <4. The remainingvalues Ix I<1, give points on the concave part of the curve. If the scores are plotted for E and E (see Lemma 2) then the curve P(x)+P(y)= 1 again gives the three types of points just considered, i.e. minimizing an expected score, convex but not obtained by minimization, concave; it also has points which are inadmissible.These latter arise when Ixl < 1 andy = -x. Comment 3. The regularity assumptions are all obviously reasonable except those on the limits at XT and XF when they are not interiorpoints. Consider what happens when they do not hold, specifically suppose limf'(x,0)/f'(x, X

XF

1)<0, or lim P(x) = a > 0. XIXF

This implies x must be chosen so that P(x) > a or is zero. But Lemma 3 shows that this implies P(x) > a,, and so on. Hence all values of x must be such that P(x) is 0 or 1: that is, the only admissiblevalues are x = XF and x = XT. Such score functions are trivial in that they always lead to asserting the truth or falsity of any event, a practice which is encouraged in present-day teaching by the requirementthat the pupil is always expected to answer from a dichotomy 'yes' or 'no': 'right'or 'wrong'.

8

D. V. LINDLEY

A strange scoring rule illustrating this is the square-root rule with f(x, 1) = (1- x) and for 0
We now turn to scoring rules that are more useful. The logarithmicform, f(x, 1) = - logx,

f(x, 0) = -log

(1 -x),

is definedonly in [0,1]. It is properwith P(x) = x. The hyperbolicform f(x, 1)= x-1,

f(x,

) = (1 - x)-

is also only defined in [0, 1]. It has P(x)= x2/x2 + (1-x)2} and is not proper, although P(x) = p has a uniqueroot in x for each 0

XT =

+ Co and P(x) = 1/(1 + e-x) ranging from 0 to 1. This is

improper but nevertheless a possibly useful rule in that it encourages the person to select x corresponding to a probability where p = 1/(1 + e-x) and hence x = log {p/(l -p)}. In other words, the values announced are log-odds. The rules with f(x, 1) = 1 - F1(x) andf(x, O)= Fo(x), where Fi(x) are distributionfunctions on (- oo, oo) are interestingbecause they are bounded both above and below and are defined on [- oo, oo]. Iff (x) andfo (;) are the correspondingdensities. P(x) =f0(x)/{f(x)+fl

(x) }.

Often these do not provide acceptable rules since the range of P(x) is not the full unit interval.An extreme case arises withfo(x) =fi(x) when P(x) = 4 for all x and only ? oo are admissible:see comment 12 below. Iffi(x) corresponds to N(1, 4) andf0(x) to N(-

1, 4), then P(x) = 1/(1 + e-x)

and we are back to a log-odds rule. Comment 5. The notion of admissibilityis essentially that of Pareto optimality. One way of expressing the result of this paper is to say that a person who accepts Pareto optimality and the invariance assumption, and who then, by some unstated process, selects a unique value from the Pareto set, is effectively introducingprobabilitiesand minimizingan expected value. In situations where the single-valuedcondition does not obtain, many of the values in the Pareto set are ruled out. (Nau's quartic rule illustratesthis.) Comment 6. The considerationsof this paper have considerablepracticalimport besides the justificationof the Bayesian argument. Consider a geologist who, after a survey, is asked to express his uncertainty about E, the existence of oil at a site, conditional on the result F of the survey. Then he may well see the position in terms of implicit score functions reflecting the dangers of giving a high value, so encouraging drilling, when the area is dry; and the lesser dangers of giving a low value when subsequent drilling reveals oil. It would not be unreasonableto expect that the implicit score function was improper and that he will therefore be motivated to give x rather than his probabilityp. This suggests that in many cases attention should be paid to the score function so that the stated value may be transformed onto the probability scale. If the geologist provides several assessments then information about the transform, and hence about the scores, can be found from the known probabilitystructureof the transformedvalue. It may, of course, happen that the implicit score function just referredto does not obey the regularityconditions. In which case the geologist will be led to make emphatic statements about the existence of oil, as was mentionedin comment 3.

Scoring rules and the inevitabilityofprobability

9

Comment 7. We now consider ways of assigning numbersto uncertainevents that have been suggested in the literatureto see if they lead to admissible values when judged by any scoring rule. For a real parameter 0, the method of (one-sided) confidence intervalsenables a numberto be attached to the event E, that 8 < a, conditional on F, the data: this is the confidencethat 6 < a and we write cf(0
cf(0< +11 data) =

cf(6<-1 Idata, 0< + 1) = y.

(3) (4)

Then, if the confidence method is admissible, we must be able to find P(.) such that P(a) = P(/)P(y): see Lemma 3. But since the first confidence statement in (3) is derived from a probability statement valid for all 0, the restriction to O<+1 in (4) makes no differenceto the validity of the statement and hence y = a. ConsequentlyP(a) = P(J/)P(a) and eitherp(f) = 1 or P(a) = 0. Hence there is no transformof a confidence statement to a probabilitystatement and the confidencevalues are inadmissible. Comment8. Another way of assigning numbersis through significancetests. Let data x have an exponential distribution with density Oe-eXfor x > 0, 8 > 0. To test the hypothesis that 0 = co , against the alternative08 o , when x is unexpectedlylarge on hypothesis co,the tail of the null distributionis used: coe-0t dt = eo P(X > x I co)= and sg (coIx) = e- x is the significanceattached to the event E that 0 = c, given F, the data. If x is small, the other tail is used and sg (coIx) = 1 - e-Ox. Hence for all x sg (ol)x)= min { e-x, 1 - e-X}. For this to correspond to a scoring rule, there must exist a transform P(.) of these values to nonnegative values with the integral over all co equal to unity: this is the addition rule of probability.But the significancevalue depends only on cox, so fP(ox) dco= 1. Let cox = u, then fP(u)du = x-1 for all x, which is impossible. Hence significancestatements are inadmissible. Comment 9. The discussion in comments 7 and 8 of confidence and significance statements is based on my personal understandingof these methods. That understandingmay be defective because the methods are not unambiguouslydescribed. For example, in comment 7, is the result that led to y = a correct? Is a confidence statement altered if the parametricrange is restricted? The discussion of significance levels in comment 8 is similarly bedevilledby the ambiguity over whether one- or two-sided tests are appropriate:we have used only the one-sided form. It is my conviction that both these methods are inadmissiblebecause they violate the likelihood principle, that easily follows from the probabilisticdescriptionof uncertainty. Comment 10. Another way of assigning numbers to uncertainevents has been suggested by Zadeh (1979). These numbers are called possibilities. Let all statments be conditional on the same event not described in the notation. Then the possibilities II (E) for events E satisfy the rule of combination

n (E UF) = max I

(E),

(F)}.

This is in conflict with the correspondingprobabilityrule p(E U F) = p(E) + p(F) for exclusive events. For if there was a transform of possibilities it would, as we saw after the proof of Theorem 2, have to be monotone and the maximization, rather than the addition, rule would be preserved.

10

D. V. LINDLEY

Comment 11. An extension of the idea of using a single numberto describethe uncertaintyof E conditional on F is to use two values, x,, x2. They are sometimes called upper and lower probabilities. To score these, one might use a function f(x1,x2,E)F. Consider applying the admissibilityideas here. (We omit the details which parallelthose given above.) With (x , x2) stated for E conditionalon F, the scores are EF=1

f(xl,x2,1), 1

(1 -E)F=

f(xl,X2,0).

As before consider small changes 8bx, 8x2 in the values. Then the score changes will be f

(X1lX2, 1)Sx1 +f2(X1,X2,

1)x2,

+f2(X1,X2,O)8X2,

fi(x,x2,O0)bx1

wherefi denotes the derivativewith respect to the itlhargument.For admissibilitythe determinant must vanish. This determinantis equal to the Jacobian of the transformationfrom (x1,x2) to (f(x,,x2, ),f(x,,x2,0)). If it vanishes everywhere,the latter functions assume constant values on the same curve in the (x1,x2) plane, so that there is no reason to choose between differentvalues on the curve and the subject is effectively only using one number (that describes which curve), ratherthan two, to measurehis uncertainty. If the Jacobian does not vanish everywherethen the values of (xl,x) are confinedto the curve where it does vanish, namely where fi (x,x2, 1) f2(Xl,X2,

1)

f1(x1x2,0) f2(x1,x2,0)

Call this common value h(x1,x2). Then again, in effect, the subject is only providing a single number describing his position on that curve. For example, suppose (x1,x2) is given for E, and (Y, Y2) for E, both conditional on F. This is the situation comparableto that in Lemma 2 and the total scores are EF= 1 f(X1,X2,1) +f(Yl,y2,0), (1 -E)F= 1 f(x,,x2,0) +f(yl,Y2,1). The changes in scores, resultingfrom changes (cx1,cx2) in (x1,x2) and (Sy,Sy2) in (yl,Y2),will be, on utilizing (5), f2(xi,x2, 1) [h(x,x2)Oxl + 8x2] +f2(y,y2,0) [h(y,y2)5yI + Y21], f2(xl,x2,0) [h(xl,x2)bXl + 8x2] +f2(Yl,Y2,1) [h(yl,y2)5yl + 8Y2]' The vanishingof the determinantgives f2(x ,x2, 1)f2(y1,y2,1) =f2(xl,X2,0)f2( 1,Y2,0), or if P(x1,x2)

-=

P(f(XI,2,0)

fZ(XlXX2 0) -f2(X1,X2, 1)r

that P(x19x2)+ P(y19y2)= 1 and we are back to the addition rule for probabilities.The product rule follows similarly. This does not close the book on the idea of using two or more numbersto describeuncertainty, for it might be reasonableto use two or more score functions, measuringdifferentqualitiesof the descriptionsin the mannerof a multiattributeutilityfunction. Comment 12. The argument of Shafer (1976) is affected by the scoring-rulecriterion. He suggests, in the situation of Lemma 2, that any values, x for E, y for E, could be used subject only to the requirementsthat x > 0, y > 0, x + y < 1. Such numbers are possible values for a

Scoring rules and the inevitabilityofprobability

11

belief function. But Lemma 2 shows that P(x) + P(y) = 1 and hence the only scoring rule to make all Shafer's values admissible has P(x)= , or f(x, 0) +f(x, 1)= constant. But this contradicts the product rule in Lemma 3. AlternativelyP(x) = means thatf' (x, 0) = -f' (x, 1) and hence and~ ~hence

limf' (x, O)/f' (x, 1)=-1 X4XF

in contradictionof the regularitycondition. Comment 13. Notice that in the score assumption we have supposed that n, the number of events judged, is finite. The infinite case causes difficultiesdue to the possible divergence of the series describingthe total scores. As a result we have only establishedthe additionrule for a finite number of events and the resulting probability is only finitely-additiveand not a-additive. We have been unable to see how, or even if it is possible, to extend the notion of a score to an enumerableinfinityof statements. Acknowledgements I am grateful to Richard E. Barlow for inviting me to Berkeley and to L.A. Zadeh who asked me to give a seminar on

the relationship betweenprobabilityandtheideasof fuzzylogic.Thisseminarsuggestedthepossibilityof theexistence of a scoringrulethat led to the laws of fuzzy sets: the papershowsno such ruleexists.The observationsof Robert Nau on a firstdraftof the paperhavebeenof considerable valueto me. Thisresearchhas beenpartiallysupportedby the Air ForceOfficeof ScientificResearch(AFSC),USAF, underGrantAFOSR-77-3179 andthe Officeof Naval ResearchunderContractN00014-75:-C-0781 withthe Universityof California.

References De Finetti,B. (197). Theoryof Probability,Vol. 1. New York:Wiley. of Bayesianinference.J. R. Statist.Soc. B 30, 205-247. Dempster,A.P. (1968).A generalization andexpectations. J. Am.Statist.Assoc.66, 783-801. Savage,L.J.(1971).Elicitationof personalprobabilities Shafer,G. (1976).A MathematicalTheoryof Evidence.PrincetonUniversityPress. Smith,C.A.B.(1961).Consistencyin statisticalinferenceanddecision.J. R. Statist.Soc. B 23, 1-37. Zadeh,L.A. (1979). Possibilitytheoryand soft data analysis,Memo.UCB/ERLM79/66. Universityof California, Berkeley.

Resume E, quiest conditionnieunievnementF, Supposonsqu'unepersonneexprimeson incertitudei l'egardd'un6evnement parun nombrex,'et supposonsqu'onlui assigne,commeresultat,un nombres quidependde x et de la veriteou de la faussetede E quandF est vrai.On demontreque si les nombress sont additivespourdes evenementsdifferents,et que si la personnene choisit que des valeursadmissibles,alorsil existe une transformation connuedes valeursx aux des valeursqui sont des probabilites.En particulier,il en resulteque des valeursx deriveesdes tests de signification, intervallesde confianceou des regles de 'fuzzy logic' sont inadmissibles.Seule la probabiliteest une description raisonnablede l'incertitude.

[Paper receivedMarch 1981, revisedJune 1981]

Discussion of paperby D.V. Lindley G.A. Barnard Mill House, Hurst Green,Brightlingsea, Colchester,Essex, England Comment 1. Dennis Lindley is to be congratulatedon a neat proof of a wide generalizationof the results of De Finetti and others, that given a finite set S of propositionsclosed under negation and conjunction (and hence under disjunction also), any 'admissible' measure of uncertainty

Scoring Rules and the Inevitability of Probability Author(s)

Smith, C.A.B. (1961). Consistency in statistical inference and decision. J. R. Statist. Soc. B 23, 1-37. Zadeh, L.A. (1979). Possibility theory and soft data analysis, ...

Download PDF

289KB Sizes 2 Downloads 187 Views

Report

Scoring Rules and the Inevitability of Probability Author(s)

Recommend Documents