Random Set Inference Marc Henry Columbia University First draft: January 28, 2004 This draft1 : September 22, 2006

Abstract Random correspondences naturally arise in a decision problem with objective information obtained from observed frequencies in incompletely identified models. A preference relation over acts available to the decision maker is shown to be represented by a generalized Hurwicz criterion if it is weakly compatible with such objective information, in the sense that it preserves stochastic dominance, and it satisfies the normatively compelling rationality requirements of transitivity and monotonicity. We then give a fully behavioural account of a preference relation compatible with this information structure and interpret it as reasoning by analogy. Finally, we show that the collection of unambiguous events for this preference relation is an algebra, and that a special case of reasoning by analogy can be construed as a combination of decision under risk and decision under complete ignorance.

Introduction In his presidential address to the econometric society on September 10, 1970 in Cambridge, entitled “Econometrics and Decision Theory,” Jacques Dr`eze gave an anatomy of an economic decision problem under uncertainty in the light of the Savage theorem (Savage (1954)). The Savage framework specifies as primitives of the decision problem a set of states of the world which describes the environment, a set of consequences, which describes what 1

I thank Lars Hansen and Mark Machina for sparking my interest in this topic, and Massimiliano

Amarante, Massimo Marinacci and Alexei Onatski for extensive and enlightening discussions. The usual disclaimer applies.

1

could happen to the decision maker, and a set of available acts, which map the states into consequences, endowed with a preference relation satisfying some rationality requirements. The Savage theorem states that the above specified preference relation is uniquely represented by an expected utility criterion with respect to a single probability measure on the set of states and a cardinal utility on consequences. Dr`eze inferred that “a decision problem under uncertainty can be logically decomposed into the following four steps: (i) define the set of acts; (ii) define a utility on the consequences; (iii) define a probability measure on the events; and (iv) find an act which is maximal with respect to expected utility.” He goes on to “suggest that positive economics should assist in defining the acts, normative economics should assist in defining the utility, that statistics and econometrics should bring empirical observations to bear upon the assessment of the probabilities, and that techniques in mathematical programming should assist in finding the optimal act.” Point (iii) has received a considerable amount of attention during the past two decades both in the economic decision theory and the econometrics literatures. On the one hand, the postulates of the Savage model that produce the assessment of acts by a linear functional (namely the axioms of independence and sure thing principle) have been called into question since Ellsberg (1961) on the grounds that they are inconsistent with observed choices, and insufficiently compelling from a normative point of view. Extensions of the Savage representation to account for certain behavioural departures include Schmeidler (1989), Gilboa and Schmeidler (1989), Ghirardato, Maccheroni, and Marinacci (2004), Maccheroni, Marinacci, and Rusticchini (2004), and they all share nonlinearity of the functional used in the assessment of acts, hence eschew the representation with a single probability measure. On the other hand, the ability of econometric models to identify, hence estimate from observed frequencies, the distribution of residual uncertainty often rests on strong prior assumption that are difficult to substantiate and even to analyze within the decision problem. A recent approach, pioneered by Chuck Manski has been to forego such prior assumptions, thus giving up the ability to identify a single probability distribution for residual uncertainty, and allow instead for a set of distributions compatible with the empirical setup. A variety of models have been analyzed in this way, whether partial identification stems from incompletely specified models or from structural data insufficiencies. See Manski (2005)

2

for an up-to-date survey on the topic. All these models with incomplete identification share the basic fundamental structure that the residual uncertainty and the relevant observable quantities are linked by a multi-valued mapping instead of a one-to-one mapping as in the case of identification. Therefore, even if we abstract from the estimation problem, observed frequencies do not translate into a single distribution for residual uncertainty to be appropriated in step (iii) of the Dr`eze program, but in a set of distributions compatible with a random correspondence. It is natural then to investigate the consequences on the Dr`eze program of replacing the probability measure of step (iii) by the random correspondence that arises from observed frequencies in a partially identified model. More precisely, can we recover a Savage like theorem that would lend itself to the kind of objective subdivision of the decision process that Dr`eze described, and compatible with the information that can be drawn from partially identified models. We give our answer in two steps; first, we strip down the properties of a complete preference ranking on the available acts to the normatively more compelling of the Savage axioms (i.e. transitivity and monotonicity), and the requirement that the induced preference on consequences be representable by a real valued utility function. We investigate the consequence of requiring that this complete preference relation respect stochastic dominance relative to the random correspondence. We find that under those normatively uncontroversial requirements, the preference relation is represented by a generalized Hurwicz criterion in the spirit of Ghirardato, Maccheroni, and Marinacci (2004). Once we have identified the implications of strict adherence to information in the form of a set of priors derived from a random correspondence, the natural question that arises is under which conditions on the preference relation does a decision maker act as if their subjective representation of uncertainty was of that form. Hence our second step in the recovery of point (iii) in the Dr`eze program is a representation theorem. We spell out the required properties for the beliefs implied by a given preference relation to conform to the type of information structure studied here. In order to give a subjective or as if interpretation to the random correspondence that generates beliefs, we need to extend 3

the primitives in our decision set-up. Both range and domain spaces of the correspondence are considered given as primitives of the decision problem. In other words, the decision maker is confronted with two decision problems which are logically distinct. One may think of having to simultaneously choose a portfolio allocation on the New York and the Tokyo stock exchanges. Assuming the decision maker is very familiar with the New York stock exchange, so much so that they maximize expected utility. Assume on the other hand that the decision maker’s only way of apprehending the Tokyo stock exchange portfolio choice problem, is through its perceived similarities with the New York stock exchange portfolio choice problem. Since the preceived relation between the two problems need not be precise, it is natural to construe it as a multivalued correspondence from the familiar state space (set of states of the world relevant to the New York decision problem) to the unfamiliar one (set of states of the world relevant to the Tokyo decision problem). The natural interpretation of the random correspondence that will arise from the representation theorem, therefore, is that of an analogy between two decision problems. The primitives are two sets of states of the world and their algebras of events, and two prize spaces, and the representation theorem will deliver cardinal utilites on the prize spaces, a single probability measure on the familiar state space, and a multi-valued correspondence transporting this probability measure to the unfamiliar state space in the form of a generally non-singleton set of priors. The functionals for evaluating acts will be expected utility in the familiar decision problem and maxmin expected utility in the unfamiliar one. The proof of the representation theorem is based on an equivalence between lower envelopes of sets of probabilities induced by a random correspondence and infinitely monotone capacities, whereas some salient features with behavioural interpretation are derived using a necessary and sufficient condition, called decomposability for a set of functions to be the set of selections of a random correspondence. We shall see, in particular, that the collection of unambiguous events in the case of reasoning by analogy is a σ-algebra, and that the functional characterizing the preference relation can be decomposed on the partition of the state space generating this σ-algebra into a combination of subjective expected utility and decision under complete ignorance in the sense of Cohen and Jaffray (1980).

Relation to the existing literature Nehring (2001) and Gajdos, Tallon, and Vergnaud (2004) consider combining objective probabilities within a subjective decision set-up; however, they require a weaker notion of 4

compatibility of the primitive preference relation with the objective probabilistic information, so that they allow salient subjective beliefs to complement the objective probabilistic information. Here, we combine compatibility with a notion of complete ignorance to obtain a much stronger representation. Philippe, Debs, and Jaffray (1999) and Ghirardato (2001) represent preference relations in decision settings where the actions available to the decision maker are multi-valued mappings rather than single-valued ones. Naturally, we can consider here the composition of the single-valued acts with the multi-valued mapping, to obtain a similar perspective, but the focus of this paper is the multi-valued mapping itself, and its interpretation as the information available to the decision maker, and indeed the only information taken into account. Gilboa and Schmeidler (2001) and Amarante (2004) present general notions of analogical reasoning, the notion presented here being very much in the spirit of the latter. However, the requirements of the present notion are both weaker, in that the analogy only concerns beliefs on the likelihood of events in the two sets of states compared, as opposed the the whole structure of the preference relation, and stronger, in the the sense of added structure on the model for the way beliefs are translated.

Organization of the paper The next section defines probabilistic information derived from random correspondences, and motivates its consideration. Section 2 investigates the implications of rational decision making with strict adherence to probabilistic information derived from a random correspondence. Section 3 gives a representation theorem for such behaviour, and interprets it as reasoning by analogy, and section 4 describes the unambiguous structure of the latter preference, and some salient properties associated with it. Proofs and related materials are collected in the appendix.

1

Information from Partially Identified Models

We consider a traditional decision setup where S is the set of states of the world, events form an algebra Σ of subsets of S and Y is a set of consequences, taken to be a convex

5

subset of a vector space 1 . The decision maker is presented with a set of acts F which are Σ-measurable functions from the set of states S to the set of consequences Y . Given two mappings f, g and a subset A of their domain D, we shall always denote by f Ag the mapping equal to f on A and g on D\A. Finally, we denote by M(S) (resp. Mca (S)) the set of finitely (resp. countably) additive probabilities on (S, Σ). We assume there exists a true data generating process governing the outcomes in S, in the form of an element of M(S), which is unknown. To obtain information about the true data generating process, the decision maker has access to past experimental data. An experimental device produces observations in a set Y˜ according to a known mapping f : S → Y˜ . In many instances, the experimental device is known with insufficient precision for f to be assumed one-to-one, or even single-valued. In either case, the inverse of f , defined for all y˜ ∈ Y˜ as f −1 (˜ y ) = {s ∈ S | y˜ ∈ f (s)} is a multi-valued mapping. If f is single-valued, its inverse has disjoint images. However, this is generally not the case when f is multi-valued. We therefore consider the channel through which we gain information about the set of states S from observed frequencies in Y˜ to be a multi-valued mapping (hereafter correspondence) Γ:

Y˜ ⇒ S y˜ 7→ Γ(˜ y)

where Γ(˜ y ) is a subset of S. Example 1 (Treatment response) A decision maker needs to choose a treatment for an individual based on the observed outcomes from a treated sample. A treatment is a measurable function t : S → Y˜ from states of the world to outcomes (generally conditioned on covariates), and P is the probability on Y˜ estimated from observed outcomes. Then Γ = t−1 is a correspondence with disjoint images. Example 2 (Multiple equilibria) An economic model with multiple equilibria can be construed as a multi-valued mapping M : S ⇒ Y˜ from states of the world to observed 1

?) show that even if the primitive set of consequences does not have a convex structure, one can

still define mixtures in a subjective sense. Hence using the Anscombe and Aumann (1963) framework is without conceptual loss of generality.

6

outcomes (again generally conditioned on covariates). Again P is the probability on Y˜ estimated from observed outcomes. Then Γ = M −1 is a correspondence with generally non-disjoint images. Without substantial loss of generality, we can assume that Y˜ is [0, 1] endowed with its Borel σ-algebra B (the σ-algebra generated by the open sets) and Lebesgue measure µ. We also assume that the primitive set of states S has the cardinality of the continuum, and is endowed with a separable and completely metrizable topology, and that Σ is the Borel σ-algebra relative to that topology. We assume that the correspondence Γ is measurable in the traditional sense, defined below: Definition 1 (Effros Measurability) A correspondence Γ : (Y˜ , B) ⇒ (S, Σ) is said to be Effros measurable, or weakly measurable, or simply measurable, if the inverse image of open sets is measurable, i.e. if for all open subsets O of S, Γ−1 (O) = {˜ y ∈ Y˜ | Γ(˜ y ) ∩ O 6= ∅} ∈ A. There are several ways a measurable correspondence can convey probabilistic information on its image space (S, Σ) given observed frequencies of outcomes in Y˜ . We abstract from estimation considerations, and assume that the probability P generating observations y˜ is given as a product of the experimentation. Dempster (1967) suggests to consider the smallest reliability that can be associated with the event A ∈ Σ as the belief function P (A) = P {˜ y ∈ Y˜ | Γ(˜ y ) ⊆ A} and the largest plausibility that can be associated with the event A as the plausibility function P (A) = P {˜ y ∈ Y˜ | Γ(˜ y ) ∩ A 6= ∅} the two being linked by the relation P (A) = 1 − P (Ac ),

(1)

which prompted some authors to call them conjugates or dual of each other. A natural way to construct a set of probability measures is to consider all probability measures that dominate the set function P set-wise, forming thus the core of the belief 7

function: Core(Γ) = {π ∈ M(S) | ∀A ∈ Σ, π(A) ≥ P (A)} = {π ∈ M(S) | ∀A ∈ Σ, π(A) ≤ P (A)} where the first equality can be taken as a definition, and the second follows immediately from (1). It is well known that Core(Γ) is non-empty, and it will be shown as a consequence of (2) below. A different way of defining probabilistic information generated by the correspondence Γ can be derived from Aumann’s idea (in Aumann (1965)) of considering correspondences as bundles of their selections. Define the domain of the correspondence Γ by Dom(Γ) = {˜ y ∈ Y˜ | Γ(˜ y ) 6= ∅}. A measurable selection γ of the measurable correspondence is defined by the property below: Definition 2 (Measurable Selection) A measurable selection of correspondence Γ : (Y˜ , B) ⇒ (S, Σ) is a (B, Σ)-measurable function γ such that γ(˜ y ) ∈ Γ(˜ y ) for all y˜ ∈ Dom(Γ). The set of measurable selections of a measurable correspondence Γ is denoted Sel(Γ), and it is non-empty by a theorem due to Rokhlin, (Rokhlin (1949) Part I, §2, No 9, Lemma 2) and generally attributed to Kuratowski and Ryll-Nardzewski: Theorem 1 (Rokhlin) An Effros measurable correspondence Γ with closed non-empty values admits a measurable selection. For a proof, see for instance Theorem 8.1.3 page 308 of Aubin and Frankowska (1990). Elements of Sel(Γ) can be used to transport the probability P on Y˜ to probabilities on S. For each γ ∈ Sel(Γ), consider the probability π defined on each A ∈ Σ by π(A) = P {˜ y ∈ Y˜ | γ(˜ y ) ∈ A} = P ◦ γ −1 (A), and define Π(Γ) = {π ∈ M(S), π = P γ −1 some γ ∈ Sel(Γ)}.

8

It is easily seen that Π(Γ) ⊆ Core(Γ)

(2)

but the reverse inclusion generally does not hold2 . We call Π(Γ) the set of priors induced by the correspondence Γ.

2

Incorporating experimental information in a decision criterion

We shall now consider how a decision maker presented solely with the information derived from experimentation can rank available acts. Here we take the collection of available acts to be the set of all finite valued measurable functions, still denoted F. We denote by E the induced incomplete likelihood relation over events in Σ, i.e. for all A, B ∈ Σ, A E B ⇐⇒ ∀π ∈ Π(Γ), π(A) ≤ π(B) We call E the likelihood associated with the random correspondence Γ, and we denote by / its asymmetric component. The following proposition, which follows immediately from Corollary 1 of Castaldo, Maccheroni, and Marinacci (2004), justifies the use of a single partial likelihood relation: Proposition 1 (Unique Likelihood) When Γ is compact-valued (i.e. for all y˜ ∈ Y˜ , Γ(˜ y) is compact), the incomplete likelihoods induced by Π(Γ) and Core(Γ) are identical. The hypothetical ranking of acts in F is modelled by a complete binary relation on F, which we call preference relation as usual, and which we denote by 4 (with symmetric component ∼ and asymmetric component ≺). We impose minimal a priori assumptions on the way the decision maker ranks the acts. First, we demand that all acts be ranked and we do not allow the decision maker to be indifferent between all acts, hence 4 satisfies: 2

One can also consider the intermediate set Π0 (Γ) of probabilities Q such that for all A in Σ, there is a

π in Π, such that Q(A) = π(A). However, since it is apparent that Π(Γ) ⊆ Π0 (Γ) ⊆ Core(Γ), and we will argue that Π(Γ) and Core(Γ) convey the same the same likelihood information, it would not add to this analysis.

9

Axiom 1 (Completeness and Non-Degeneracy) For all f, g ∈ F, f 4 g or g 4 f , and there are f, g ∈ F such that f ≺ g. We add the two normatively uncontroversial properties of transitivity and monotonicity. Monotonicity says that an act that yields a preferred consequence in all states of the world, should be preferred, whereas transitivity says that if the decision maker ranks h above g above f , then they must rank h above f . Both may be violated in some observed behaviour, and transitivity is often incompatible with aggregate preferences, however, they are both associated with rationality of a preference ordering at least since Ramsey (1926). Axiom 2 (Transitivity) If f, g, h ∈ F, f 4 g and g 4 h, then f 4 h. Axiom 3 (Monotonicity) If f, g ∈ F and f (s) 4 g(s) all s ∈ S, then f 4 g. Finally we consider the implications on 4 of conforming to the information available. We first impose the requirement that the preference over binary acts (i.e. acts of the form xAy, for x, y ∈ Y constants, and A ∈ Σ an event) conform with the incomplete likelihood relation E. A preference relation 4 on F is said to be compatible with the likelihood relation E if the following holds3 : Definition 3 (Compatibility) 4 is compatible with E if for all events A, B ∈ Σ and consequences x, y ∈ Y such that x ≺ y, we have A E (resp. /) B ⇒ xAy 4 (resp. ≺) xBy This notion of compatibility of the preference relation with the incomplete likelihood obtained from frequency data is identical to the notion used by Nehring (2001). We shall also consider a departure from the latter, which requires strict adherence to the available objective information, eschewing all additional subjective beliefs, according to the following Axiom. This latter notion of compatibility is more closely related to the principle of complete ignorance of Cohen and Jaffray (1980). Axiom SA (Strict Adherence) 4 is compatible with the incomplete likelihood E, and if it is compatible with an incomplete likelihood E∗ ⊇E, then E∗ =E. 3

Note that non-degeneracy and monotonicity imply that we can construct two constant acts x and y

such that x ≺ y. For f ≺ g, take x = mins∈S f (s) and y = maxs∈S f (s).

10

We now consider how the preference on multi-valued acts is constrained by the likelihood relation. Notice first that E induces a relation of stochastic dominance on acts in F is the following way: Definition 4 (Stochastic Dominance) An act f is said to stochastically dominate an act g ∈ F relative to E if for any consequence x ∈ Y , {s ∈ S, f (s) 4 x} E {s ∈ S, g(s) 4 x} Requiring that this stochastic dominance relation be respected by the ranking of acts 4 is as compelling from a rationality point of view as the property of monotonicity, once it is accepted that the ranking should not be at odds with the likelihood relation derived from experimental information. Hence, we impose the rationality requirement that 4 respect stochastic dominance relative to the probabilistic beliefs in the sense of Axiom SD below: Axiom SD (Stochastic Dominance) For all acts f, g ∈ F, f < g whenever f stochastically dominates g with respect to E. Blackwell’s characterization of stochastic dominance allows us to show the following: Proposition 2 (Blackwell) For any two acts f, g ∈ F, f stochastically dominates g with respect to E if and only if for any ordinal utility index of 4, we have Z Z ∀ π ∈ Π(Γ), u(f ) dπ ≥ u(g) dπ. S

S

Suppose now the preference relation satisfies the regularity conditions of continuity and certainty-independence in order to obtain a cardinal utility representation of preferences over X. Axiom 4 (Certainty Independence) If f, g ∈ F, x ∈ Y and λ ∈ (0, 1], then f 4 g ⇐⇒ λf + (1 − λ)x 4 λg + (1 − λ)x. Note that this axiom, which is an essential part of the axiomatization of Gilboa and Schmeidler (1989) is far weaker than Savage’s independence axiom (below) which guarantees linearity of the functional representing the preference relation. Axiom 4∗ (Independence) If f, g, h ∈ F, and λ ∈ (0, 1], then f 4 g ⇐⇒ λf + (1 − λ)h 4 λg + (1 − λ)h. 11

For completeness, we give an intermediate requirement, that of comonotonic independence introduced by Schmeidler (1989) to axiomatize Choquet Expected Utility. Remember that two acts f and g are comonotonic if for no pairs of states s, t, we have f (s) ≺ f (t) and g(t) ≺ g(s). Axiom 4∗∗ (Comonotonic Independence) For all pairwise comonotonic acts f, g, h ∈ F and all λ ∈ (0, 1], we have f 4 g ⇐⇒ λf + (1 − λ)h 4 λg + (1 − λ)h. The continuity requirement (which allows representation of preferences on the prize space to be represented by a real valued utility function) is the following: Axiom 5 (Archimedean Property) If f, g, h ∈ F, f ≺ g and g ≺ h, then there exist λ, µ ∈ (0, 1) such that λf + (1 − λ)h ≺ g and g ≺ µf + (1 − µ)h. Note that axioms 4 and 5 minimally ensure that 4 is an invariant bi-separable preference in the terminology of Ghirardato, Maccheroni, and Marinacci (2004). In other words, they are minimal requirements for 4 to be represented by a unique functional, and for that functional to be separable on binary acts in a way that uniquely distinguishes utility and beliefs. The following lemma (which combines lemma 1 and claim (3) of Ghirardato, Maccheroni, and Marinacci (2004)) summarizes these claims. A functional I on the set B0 (Σ) of finite valued Σ-measurable functions is called constant linear if for all a, b ∈ R, a ≥ 0 and all ϕ ∈ B0 (Σ), I(aϕ + b) = aI(ϕ) + b. Lemma 1 (Invariant Biseparable Preference) The following statements hold true: (i) A binary relation 4 on F satisfies Axioms 1-5 if and only if there exists a monotonic, constant linear functional I and a nonconstant affine function u such that f 4 g ⇐⇒ I(u(f )) ≤ I(u(g)). (ii) I is unique and u is unique up to positive affine transformations. (iii) I is independent of the chosen normalization of the utility u. (iv) There exists a capacity (see Definition 13) ρ called willingness to bet, such that, for all x, y ∈ Y such that x < y, and all events A ∈ Σ, I(u(xAy)) = u(x)ρ(A) + u(y)(1 − ρ(A)). 12

Notice that if 4 is compatible with the partial likelihood represented by the set of priors Π, then the willingness to bet is bounded by the upper and lower envelopes of Π. In other words, for all A ∈ Σ, we have P (A) = inf π(A) ≤ ρ(A) ≤ sup π(A) = P (A). π∈Π

π∈Π

Now that we have a cardinal utility representation of preferences on the prize space, we can consider a cardinal notion of dominance with repect to the set of priors Π(Γ): Axiom SD∗ (Bewley Unanimity) For any two acts f, g ∈ F, we have Z Z ∀ π ∈ Π(Γ), u(f ) dπ ≥ u(g) dπ ⇒ f < g. S

S

It is important to note that under the addition requirement that the set of priors Π(Γ) be convex-ranged (see the definition below, from Nehring (2001)), the Stochastic Dominance axiom (which only refers to ordinal information) implies Bewley Unanimity. Definition 5 (Convex-ranged set of priors) A set of priors Π is convex-ranged if for any event A and any α ∈ (0, 1), there is an event B ⊂ A such that π(B) = απ(A) for all π ∈ Π. Lemma 2 If Π is convex-ranged, and 4 is invariant bi-separable, then Axiom SD implies Axiom SD∗ . We now come to the main result of this section which gives an operational functional representation for a biseperable preference relation that satisfies stochastic dominance and strict adherence with respect to a set of priors generated by a random correspondence. Proposition 3a (Generalized Hurwicz Criterion) If 4 which satisfies Axioms 1-5 and Axioms SA and SD∗ , then the functional I representing 4 has the following form: for all f ∈ F, Z I(u ◦ f ) = αf

Z min(u ◦ f ) dP (˜ y ) + (1 − αf )

y) Y˜ Γ(˜



max(u ◦ f ) dP (˜ y ), Γ(˜ y)

(3)

where αf is a [0, 1]-valued generalized Hurwicz index. As was argued in Ghirardato, Maccheroni, and Marinacci (2004), the generalized Hurwicz index αf embodies the attitude of the decision maker to the incompleteness of the likelihood 13

E.4 In particular, when 4 satisfies Axiom 6∗ below, or Axiom 6 below and Axiom 4∗∗ , then αf = 1 for all f ∈ F, and 4 belongs to the intersection of the classes of maxmin expected utility preferences of Gilboa and Schmeidler (1989) and Choquet expected utility of Schmeidler (1989), as shown in Corollary 1: Axiom 6 (Preference for hedging of bets) If x, y, z, t ∈ Y , and A, B ∈ Σ, then xAy ∼ zBt ⇒ 12 (xAy + zBt) < zBt. This axiom is a restriction of the ambiguity aversion Axiom of Gilboa and Schmeidler (1989) restricted to simple bets, and as such, it is both much weaker, and much easier to confront with observed behaviour. Axiom 6∗ (Ambiguity Aversion) If f, g ∈ F, then f ∼ g ⇒ 12 (f + g) < g. Corollary 1 (MEU and CEU) If 4 which satisfies Axioms 1-5 and 6∗ and Axioms SA and SD∗ , or if it satisfies Axioms 1-3, 4∗∗ , 5,6 and Axioms SA and SD∗ , then the functional I representing 4 has the following form: for all f ∈ F, Z Z Z I(u ◦ f ) = min(u ◦ f ) dP (˜ y ) = min u ◦ f (s) dπ(s) = y) Y˜ Γ(˜

π∈Π(Γ)

S

u ◦ f dP , Ch

where the third integral is taken in the sense of Choquet with respect to the lower envelope P of Π(Γ). In the case where αf is not constant over F, we need to exhibit further restrictions on its values, and a domain on which it is uniquely determined. To that end, we investigate the unambiguous structure generated by the incomplete likelihood E, which is also of independent interest. First and foremost, we define unambiguous events in the sense of Nehring (2001) and Ghirardato, Maccheroni, and Marinacci (2004). Definition 6 (Unambiguous Events) A measurable subset U of S is called an unambiguous event for the decision maker with preference relation satisfying Axioms 1-5, and Axioms SA and SD∗ if and only if π(U ) = π ˜ (U ) for all π, π ˜ ∈ Π(Γ). The choice of probabilistic representation of information generated by a random correspon4

Notice that strict adherence to the incomplete likelihood, i.e. Axiom SA, rules out decision mak-

ing functionals that reveal additional subjective beliefs, such as expected utility, if indeed the objective likelihhod is incomplete.

14

dance does not affect the definition of unambiguous events since a measurable subset U of S is unambiguous if and only if P (U ) = P (U ). Generally, it was shown by Ghirardato, Maccheroni, and Marinacci (2004) that the class U of unambiguous events for a preference relation satisfying axioms 1-5 is a λ-system. However, in the case of strict adherence to a partial likelihood generated by a random correspondence, we can be more precise. Lemma 3a (Algebra of unambiguous events) The class U of unambiguous events for a preference relation satisfying Axioms 1-5, and Axioms SA and SD∗ is an algebra. Corollary 2 (Probability on unambiguous events) The restriction to U of Π(Γ) is an additive probability measure. Corollary 3 (SEU on unambiguous acts) The restriction to U-measurable acts of a preference relation satisfying Axioms 1-5, and Axioms SA and SD∗ is Savage expected utility. Notice that U-measurable acts are linear combinations of indicator functions of elements of the partition $ that generates the σ-algebra U. Call the set of such acts F$ . Hence, Corollary 3 follows immediately from the fact that all elements of F$ belong to the maximal restriction of 4 which satisfies Axiom 4∗ , which in turn is implied by the following: Lemma 3b (Partition independence) If 4 satisfies Axioms 1-5 and Axioms SA and SD∗ , then it satisfies Axiom 4∗∗∗ below. Axiom 4∗∗∗ (Partition independence) If f, g ∈ F and h ∈ F$ , and λ ∈ [0, 1], then f ∼ g implies λf + (1 − λ)h ∼ λg + (1 − λ)h. A more salient feature is given by the following Axiom: Axiom 8 (No non trivial unambiguous equivalence) If f, g ∈ F are such that for all h ∈ F, and λ ∈ [0, 1], λf + (1 − λ)h ∼ λg + (1 − λ)h, then there is a, a0 ∈ F$ , λ ∈ (0, 1] such that for all A ∈ Σ, (λf + (1 − λ)a) A (λg + (1 − λ)a0 ) ∼ (λf + (1 − λ)a) . This axiom essentially says that two acts are unambiguously equivalent if and only if their 15

“difference” is pointwise indifferent to an element of F$ . We have: Lemma 3c (No non trivial unambiguous equivalence) If 4 satisfies Axioms 1-5 and Axioms SA and SD∗ , then it satisfies Axiom 8. The main implication of the proposition above is that in this setting, crisp acts in the sense of Ghirardato, Maccheroni, and Marinacci (2004) are exactly the unambiguous acts, i.e. acts that are pointwise indifferent to elements of F$ . Denote the set of unambiguous acts ˜$. F We are finally in a position to state the properties of the generalized Hurwicz index αf : Proposition 3b (Generalized Hurwicz index) The generalized Hurwicz index of ˜ $ and satisfies αf = αg for all f, g ∈ F such that Proposition 3b is unique on F\F (λf + (1 − λ)a)A(λ0 g + (1 − λ0 )a0 ) ∼ (λf + (1 − λ)a) some a, a0 ∈ F$ , some λ, λ0 ∈ (0, 1] and all A ∈ Σ. In other words, the generalized Hurwicz index is constant over equivalence classes of acts of similar ambiguity, where two acts have similar ambiguity whenever there is some “linear combination” of the two which is unambiguous.

3

Representation Theorem and Reasoning by Analogy

Once we have identified the most salient characteristic of a preference relation that adheres strictly to an objective set of priors derived from a random correspondence, the natural question to ask is whether we can give a fully behavioural account of such a preference relation. In other words, we have seen in the previous section that it is represented by a generalized Hurwicz criterion; can we be more precise and give a representation theorem? This is the object of this section. We now propose to derive from certain regularities on the weak order on simple acts a subjective version of the random correspondence Γ in the spirit of the Savage subjective probability measure on the state space. To do so, we first place ourselves in the special case of Corollary 1 and we enlarge our set of primitives and confront the decision maker with

16

two distinct problems. On the one hand, the decision maker has to choose among simple acts in a state space (Ω, A). The choice among acts in the latter is a familiar problem, and such familiarity is modelled by the fact that the decision maker’s preference relation 4Ω on simple acts in (Ω, A) satisfies the axioms of the Savage representation theorem (Axioms 13, 4∗ and 5). Hence the decision maker chooses as if they were maximizing expected utility with respect to a single probability measure. On the other hand, the decision maker is confronted with an unfamiliar problem, one were the sole possible guidance in the choice among acts in an unfamiliar state space (S, Σ), beyond basic rationality requirements (Axioms 2 and 3), is provided by the ability to compare the two problems. The comparison takes the form of an analogy, in the sense that the decision maker uses their knowledge of the familiar problem together with an understanding of the way the familiar and the unfamiliar problems are related, in order to form a ranking of the acts in the unfamiliar state space. We provide here necessary and sufficient conditions under which a preference relation on the unfamiliar state space can be represented by such an analogy, and the analogy is formalized as a multi-valued function from the familiar state space to the unfamiliar one. The representation theorem hinges on an equivalence between lower envelopes of sets of probabilities induced by a random correspondence, Dempster belief functions and infinitely monotone Choquet capacities, and as a result, an important by-product of our analysis is a new perspective on Dempster’s model of generalized Bayesian inference, described in Dempster (1968). For this section, we assume the space of consequences or prizes Y to be an affine space, in order to avoid writing regularity conditions dependent on the utility function. First we introduce additional definitions derived from a preference relation 4 that satisfies Axioms 1-5. The first is a restriction of 4 called unambiguous preference and was proposed by Nehring (2001) and Ghirardato, Maccheroni, and Marinacci (2004) as a step towards eliciting beliefs. Loosely speaking, the relation contains the pairs of acts that would be ranked identically by two decision makers with identical beliefs and different attitudes to ambiguity. Definition 7 (Unambiguous Preference) ¹ is the maximal restriction of 4 that satisfies Axiom 4∗ (Independence). We introduce the second derivative of 4, called unambiguous envelope preference, which 17

contains all pairs of acts f, g that are just unambiguously related, i.e. f is unambiguously preferred to g, but to no uniform improvement on g. Definition 8 (Unambiguous Envelope Preference) w is defined by f w g if and only if f ¹ g and for any ², η ∈ Y such that ²  η, f + ²  g + η. We are now in a position to introduce the regularity axiom needed to ensure that the set of priors that we interpret as representing the decision maker’s beliefs is the set of priors induced by a random correspondence. They are of a similar nature to the Arrow (1970) monotone continuity axiom: Axiom A1 (Monotone Continuity) If f, g ∈ F, x ∈ Y , {En }n≥1 ∈ Σ with E1 ⊇ E2 ⊇ . . . T and n≥1 En = ∅, then f  g implies that there exist n0 ≥ 1 such that xEn0 f  g. They are regularity properties of the weak order with no behavioural interpretation, but which provide a representation with operationally convenient added structure, i.e. countable additivity of the subjective probability in the case of Arrow’s Axiom A1, and the structure of random correspondences here. In the next section, we give some salient necessary conditions on the preference with behavioural interpretation. Axiom 7-k (k-Monotonicity) If Ei ∈ Σ, i = 1, . . . , k, x, y, xI , yI ∈ Y such that x  y S T and xI  yI , and α, αI > 0, and x ki=1 Ei y v α(x − y) and xI i∈I Ei yI v αI (xI − yI ) all I ⊆ {1, . . . , k}, then α≥

X

(−1)|I|+1 αI .

I⊆{1,...,k}

Axiom 7-∞ (Total Monotonicity) Axiom 7-k holds for all k ≥ 2. As a final ingredient in our representation, we ensure that the familiar decision problem is sufficiently versatile to lend itself to the analogy. To that end, we add Axiom A2 from Arrow (1970) to ensure that the decision maker’s subjective probability over the familiar state space is non-atomic. Axiom A2 (Atomlessness) For all A ∈ A such that IA ÂΩ 0, there exists B ∈ A such that IA ÂΩ IB ÂΩ 0. We can now state the representation theorem in terms of an analogy between a familiar 18

state space (Ω, A) and an unfamiliar one (S, Σ). The decision maker has a preference 4Ω on the set FΩ of simple acts from Ω into a consequence space XΩ , and a preference 4S on the set FS of simple acts from S into a consequence space XS . Each of the two problems satisfies the conditions of section 1. Theorem 2a (Reasoning by analogy) Two binary relations 4Ω on FΩ and 4S on FS satisfy Axioms 1-3, 4∗ , 5, A1 and A2 and Axioms 1-5, 6∗ , 7-∞ and A1 respectively if and only if there exists an analogy, i.e. a measurable correspondence Γ : Ω ⇒ S, a countably additive probability measure on (Ω, A) and linear utility functions uS and uΩ such that for all f, g ∈ FS and all h, l ∈ FΩ , Z h 4Ω l

Z

⇐⇒

(uΩ ◦ h) dP ≤ Ω

(uΩ ◦ l) dP Z

Z f 4S g

⇐⇒

and



inf (uS ◦ f ) dP ≤

y) Ω Γ(˜

inf (uS ◦ g) dP.

y) Ω Γ(˜

Moreover, uS and uΩ are unique up to positive affine transformations, P is unique and the analogy Γ is unique in the sense of the following theorem. Theorem 2b (Uniqueness of the analogy) The analogy used by the decision maker is unique in the following sense: If 4S admits two analogies Γ : Ω ⇒ S and Γ0 : Ω0 ⇒ S, then there exists a one-to-one mapping ζ : Ω → Ω0 such that ζ and ζ −1 are measurable, and P 0 = P ζ −1 and Γ = Γ0 ◦ ζ, Γ0 = Γ ◦ ζ −1

Appendix A: Random correspondences and decomposability Consider (Ω, A) a measurable space, and (S, B) a Polish space (i.e. a space with a complete, separable and metrizable topology) endowed with its Borel σ-algebra (the σ-algebra generated by the open sets), and Γ a correspondence taking values in Ω into subsets of S. Denote by d a distance that metrizes the topology on S, and, as usual, define for any closed subset C of S and any s ∈ S, d(s, C) = inf{d(s, c) : c ∈ C}. Definition 10 (Castaing Representation) A Castaing representation of a correspon19

dence Γ is a sequence (γn )n≥1 of measurable selections, such that, for every ω ∈ Dom(Γ), Γ(ω) = cl{γn (ω) : n ≥ 1} (where cl(A) denotes the closure of the set A). Theorem 3 (Castaing) If Γ has closed non-empty values, the following statements are equivalent: (i) Γ is Effros measurable (ii) For all s ∈ S, the map d(s, Γ(.)) is A-measurable (iii) Γ admits a Castaing representation (For a proof, see for instance Theorem 8.3.1 page 319 of Aubin and Frankowska (1990)) Denote by Sel(Γ) the collection of measurable selections of the correspondence Γ. Definition 11 (Decomposability) Let S be a set of measurable functions from the measurable space (Ω, A) to the measurable space (S, B). S is called decomposable with respect to A if, whenever φ, ψ ∈ S and A ∈ A, φAψ ∈ S. The proof of the following theorem uses a construction that Hiai and Umegaki (1977) proposed to define generalized conditional expectations. Lemma 4 If (Ω, A) is endowed with a probability measure P , and if S is compact, for any non empty collection S of bounded measurable functions, there exists a closed-valued Effros measurable correspondence such that Sel(Γ) = S if and only if S is decomposable. Proof of Lemma 4 From the Castaing Theorem applied to the trivial correspondence that to any ω associates the whole space S, there exists a sequence of measurable functions {γn }n≥1 from Ω to S, such that for all ω ∈ Ω, {γn (ω)}n≥1 is dense in S. Since S Ω is metrizable, we can consider ζn = inf d(γ, γn ), γ∈S

and a sequence γnm ∈ S such that for all n ≥ 1, lim d(γnm , γn ) = ζn .

m→∞

We are now in a position to define a correspondence Γ by Γ(ω) = cl{γnm (ω)}n,m≥1 for all ω ∈ Ω. 20

We now claim that for each γ ∈ Sel(Γ) and each ² > 0, there exists a finite partition of Ω in measurable sets {E1 , . . . , EK } and functions {γ1 , . . . , γK } in the sequence {γnm }n,m≥1 P 5 such that d(γ, K k=1 γk IEk ) < ² . To prove the claim, take {γk }k≥1 to be a re-enumeration of {γnm }n,m≥1 and take a strictly positive P -integrable real valued function ξ such that supΩ ξ(ω) < ². There exists a countable partition of Ω in measurable sets {Ak }k≥1 , such that for all k ≥ 1, d(γ(ω), γk (ω)) < γ(ω) for all ω ∈ Ak . Choose K such that [d(γ(ω), γ1 (ω))] < ²,

sup

S∞

k=K+1 Ak

then, putting Ek = Ak , 2 ≤ k ≤ K and E1 = A1 ∪ d(γ,

K X k=1

γk IEk ) =

¡ S∞ k=K+1

¢ Ak , we have

max sup d(γ(ω), γk (ω))

1≤k≤n Ek

Ã

≤ max

! sup

S∞

k=K+1

[d(γ(ω), γ1 (ω))] , sup ξ(ω)

< ²,



Ak

which completes the proof of the claim. Since we assumed S decomposable,

PK k=1

γk IEk

is in S, hence, by virtue of the claim above, we have proved that Sel(Γ) ⊆ S. Suppose now that there exists a γ in S but not in Sel(Γ). Hence, there exists E ⊂ Ω, P (E) > 0 and η > 0 such that for all ω ∈ E, inf d(γ(ω), γnm (ω)) > η.

n,m≥1

Take N such that the set A = E ∩ {ω ∈ Ω : d(γ(ω), γN (ω)) < η/3}, and consider for all ∗ m ≥ 1, γm = γAγN m . Since S is decomposable by assumption, γm ∈ S, and

d(γN (ω), γN m (ω)) ≥ d(γ(ω), γN m (ω)) − d(γ(ω), γN (ω)) > 2η/3. Hence, ∗ (ω)) d(γN (ω), γN m (ω)) − ζN ≥ d(γN (ω), γN m (ω)) − d(γN (ω), γm

≥ 2η/3 − η/3 > 0, which leads to a contradiction when we let m go to infinity. It is clear that Γ so defined is Effros measurable, and that, conversely, the set of measurable selections of a measurable correspondence is decomposable. ¤ 5

Note that IE (ω) = 1 when ω ∈ E and zero otherwise, and that

for the function that is equal to γk on Ek , all k = 1, . . . , K.

21

PK k=1

γk IEk is a convenient notation

Let us now consider two Polish spaces endowed with their Borel σ-algebras, viz. (Ω, A) and (S, B). Suppose we are given an additive probability measure P on (Ω, A) and a subset Π of the set M of additive probability measures on (S, B). Lemma 5 For any non atomic probability measure π on (S, A), there exists an onto (A, B)-measurable function γ such that for all B ∈ B, π(B) = P (γ −1 (B)) ≡ P ({ω ∈ Ω : γ(ω) ∈ B}). π will be called the γ push-forward of P and will be denoted π = P γ −1 . Proof of Lemma 5 Immediate corollary to the von Neumann Theorem (page 69 of Billingsley (1965).) ¤

Definition 12 We call representation of Π a set S of onto (A, B)-measurable functions such that Π = {π ∈ M(S) : ∃γ ∈ S, π = P γ −1 }. We now investigate the properties of sets of measures that admit a decomposable representation. To that end, we introduce two set functions φ and Φ defined for all B ∈ B as

à φ(B) = inf P Sc ∈S

\

! γ −1 (B)

à and Φ(B) = sup P Sc ∈S

γ∈Sc

[

! γ −1 (B) ,

γ∈Sc

where the infimum and supremum are over all countable subfamilies of functions in S. We first show the following Lemma 6 (Envelopes) If a set of probability measures Π has a decomposable representation S, then for all B ∈ B, inf π(B) = φ(B) and sup π(B) = Φ(B). Π

Π

Proof of Lemma 6 For any B ∈ B, any two measures π1 and π2 in Π (possibly identical ˜ = P γ˜ −1 when Π is a singleton) and γ1 , γ2 ∈ S such that πi = P γi−1 , i = 1, 2, the measure π with γ˜ = γ1 Aγ2 , A ∈ A, is in Π by virtue of the decomposability of S. Now notice that π ˜ (B) = P γ˜ −1 (B) 22

= P {ω ∈ Ω : γ1 Aγ2 (ω) ∈ B} = P ({ω ∈ A : γ1 (ω) ∈ B} ∪ {ω ∈ Ac : γ2 (ω) ∈ B}) = P (A ∩ γ1−1 (B)) + P (Ac ∩ γ2−1 (B)). Hence, choosing A = γ2−1 (B), we get π ˜ (B) = P (γ1−1 (B) ∩ γ2−1 (B)), whereas choosing ˜ (B) = P (γ1−1 (B) ∪ γ2−1 (B)). This reasoning can readily be extended A = γ1−1 (B) yields π to any countable subset Sc of S to yield the existence of π∗ and π ∗ ∈ Π such that à ! à ! \ [ −1 ∗ −1 π∗ (B) = P γ (B) and π (B) = P γ (B) . γ∈Sc

γ∈Sc

The result follows. ¤ We now consider the properties of φ and Φ. Note that both are monotone and normalized, i.e. for ν = φ or Φ, ν(∅) = 0, ν(S) = 1 A, B ∈ B, A ⊆ B ⇒ ν(A) ≤ ν(B)

Definition 13 (Capacity) A set function that is monotone and normalized as above and satisfies continuity properties (i) and (ii) below is called a Choquet capacity. (i) ν(Bn ) ↑ ν(B) for all sequences of Borel sets Bn ↑ B (ii) ν(Cn ) ↓ ν(C) for all sequences of closed sets Cn ↓ C (Choquet (1954)) Definition 14(Cocapacity) A set function that is monotone and normalized as above and satisfies continuity properties (i’) and (ii’) below is called a cocapacity. (i’) ν(Bn ) ↓ ν(B) for all sequences of Borel sets Bn ↓ B (ii’) ν(Gn ) ↑ ν(C) for all sequences of open sets Gn ↑ G (Philippe, Debs, and Jaffray (1999)) Definition 15 (Convex, supermodular, 2-monotone) A set function ν is called convex if A, B ∈ B, ν(A) + ν(B) ≤ ν(A ∪ B) + ν(A ∩ B). 23

A set function that satisfies the reverse inequality is called 2-alternating, or submodular. Definition 16 (infinitely monotone, totally supermodular) A set function is called infinitely monotone if for any n and any sequence A1 , . . . , An of Borel sets, Ãn à ! ! [ X \ ν Ai ≥ (−1)|I|+1 ν Ai i=1

∅6=I⊆{1,2,...,n}

I

Definition 17 (infinitely alternating, totally submodular) A set function is called infinitely alternating if for any n and any sequence A1 , . . . , An of Borel sets, Ãn ! Ã ! \ X [ |I|+1 ν Ai ≤ (−1) ν Ai i=1

∅6=I⊆{1,2,...,n}

I

Lemma 7 If Π admits a decomposable representation, then φ is an infinitely monotone cocapacity and Φ is an infinitely alternating capacity. Proof of Lemma 7 Since φ and Φ are dual in the sense that for all B ∈ B, φ(B) = 1 − Φ(B c ), we need only prove that Φ is an infinitely alternating capacity. Let S be the representation of Π. By Lemma 2, there exists a closed valued Effros measurable ˜ defined by correspondence Γ such that S = Sel(Γ). Now consider the set function Φ ˜ For all B ∈ B, Φ(B) = P ({ω ∈ Ω : Γ(ω) ∩ B 6= ∅}). ˜ is equal to the lower By the corollary 1 of Castaldo, Maccheroni, and Marinacci (2004), Φ envelope of Π, hence to Φ by the Envelopes Lemma. Hence we can apply a theorem of Choquet’s (in Choquet (1954) section 26.8) to Φ to complete the proof. ¤ Finally we state the converse of the previous theorem: ˜ is an infinitely alternating capacity on a Polish space S, Theorem 4 (Choquet) If Φ then there exists an Effros measurable closed valued correspondence Γ : Ω ⇒ S such that ˜ For all B ∈ B, Φ(B) = P ({ω ∈ Ω : Γ(ω) ∩ B 6= ∅}). For a proof, see for instance Lemma 2(ii) of Castaldo, Maccheroni, and Marinacci (2004).

24

Appendix B: Proofs of results in the main text Proof of Lemma 2: Let f and g be two acts such that π(u◦f ) ≥ π(u◦g) for all π ∈ Π(Γ). Using the convex rangedness of the set Π(Γ), we can find two sets Af and Ag in Σ such that there exists representations u◦f =

nf X

fi IFi and u ◦ g =

ng X

gi IGi

i=1

i=1

such that for all i and all π ∈ Π(Γ), and using subscripts − and + to denote maximum and minimum values, we have (f+ − f− )π(Af ∩ Fi ) = (fi − f− )π(Fi ) and (g+ − g− )π(Ag ∩ Gi ) = (gi − g− )π(Gi ). Note that the latter is trivially satisfied when f or g is constant. Since π(u◦f ) ≥ π(u◦g) all π ∈ Π(Γ), we have π(Af ) ≥ π(Ag ), all π so that ρ(Af ) ≥ ρ(Ag ) where ρ is the willingness to bet of Lemma 1. The latter implies f < g by constant liearity of the functional I of Lemma 1 and the stochastic dominance axiom. ¤

Proof of Proposition 3a: First, we show that under stochastic dominance and strict adherence, the unambiguous preference relation ¹ of Definition 7 is represented by the weak∗-closed convex hull of Π(Γ) in the sense of Proposition 5 of Ghirardato, Maccheroni, and Marinacci (2004). Let f, g ∈ F. If for all π ∈ Π(Γ), π(u(f )) ≥ π(u(g)), then by linearity, for all h ∈ F, π(u(λf + (1 − λ)y) ≥ π(u(λg + (1 − λh))), hence f ≺ g by Bewley Unanimity (Axiom SD∗ ). If 4 satisfies SD∗ with respect to a weak∗ -closed convex set of priors Π which is strictly included in Π(Γ), then it is easily seen that 4 is compatible with EΠ which contains EΠ(Γ) , hence by strict adherence (axiom SA), EΠ =EΠ(Γ) , and the result follows. There remains to show the right-hand side of representation (3). It follows from Theorem 1 of Castaldo, Maccheroni, and Marinacci (2004) and the following sequence of equalities: Z inf {u(f (s))} dP (ω) Ω s∈Γ(ω)

25

Z



=

P {ω ∈ Ω : inf {u(f (s))} ≥ y} dy s∈Γ(ω) Z 0 + (P {ω ∈ Ω : inf {u(f (s))} ≥ y} − 1) dy

0

Z



s∈Γ(ω)

−∞

=

P {ω ∈ Ω : Γ(ω) ⊆ {u(f ) ≥ y}} dy Z 0 + (P {ω ∈ Ω : Γ(ω) ⊆ {u(f ) ≥ y}} − 1) dy −∞ Z ∞ Z 0 = ρ({u(f ) ≥ y}) dy + (ρ({u(f ) ≥ y}) − 1) dy 0 −∞ Z = u(f ) dρ 0

Ch

to conclude. ¤

Proof of Corollary 1: All we need to show is that under Axiom 6, the willingness to bet identified in Lemma 1 is convex (see Definition 15). Act 12 (xAy + zBt) yields 12 (u(x) + u(z)) when A ∩ B occurs, 21 (u(x) + u(t)) when A ∪ B c occurs, 21 (u(y) + u(z)) when Ac ∪ B occurs, and 12 (u(y) + u(t)) when (A ∪ B)c occurs. Since Proposition 3a established that 4 is CEU, acts are assessed by means of the Choquet integral of their utility with respect to the willingness to bet ρ. To fix ideas, we can assume that u(x) − u(y) ≥ u(z) − u(t). So Axiom 6 implies that Z Ch

1 u(xAy + zBt) dρ ≥ 2

Z u(zBt) dρ. Ch

Since A and B play symmetric roles in Axiom 6, this also implies ¶ µZ Z Z 1 1 u(zBt) dρ + u(xAy) dρ . u(xAy + zBt) dρ ≥ 2 Ch 2 Ch Ch Hence, 1 1 (u(y) + u(t))(1 − ρ(A ∪ B)) + (u(y) + u(z))(ρ(A ∪ B) − ρ(Ac ∩ B)) 2 2 1 1 + (u(x) + u(t))(ρ(Ac ∩ B) − ρ(A ∩ B)) + (u(x) + u(z))(ρ(A ∩ B)) 2 2 1 ≥ {u(x)ρ(A) + u(z)ρ(B) + u(y)(1 − ρ(A)) + u(t)(1 − ρ(B))} . 2 Finally, (u(z) − u(t))(ρ(A ∪ B) + ρ(A ∩ B) − ρ(B) − ρ(Ac ∩ B)) ≥ (u(x) − u(y))(ρ(A) − ρ(Ac ∩ B)), 26

which, given u(x) > u(y), u(z) > u(t) and u(x) − u(y) ≥ u(z) − u(t), implies ρ(A ∪ B) + ρ(A ∩ B) ≥ ρ(A) + ρ(B)), so that ρ is convex. ¤

Proof of Lemma 3a: It is an immediate consequence of the supermodularity of the lower envelope φ of Π (and the corresponding submodularity of the upper envelope Φ). Indeed if A and B are assumed unambiguous, we have Φ(A ∪ B) + Φ(A ∩ B) ≤ Φ(A) + Φ(B) = φ(A) + φ(B) ≤ φ(A ∪ B) + φ(A ∩ B) So the class is immediately seen to be a λ-system, by considering A∩B = ∅ and a π-system, since φ(A ∪ B) ≤ Φ(A ∪ B) is always true. ¤

Proof of Lemma 3b: Follows from Proposition 10 of Ghirardato, Maccheroni, and Marinacci (2004). ¤

Proof of Lemma 3c: If λf + (1 − λ)h ∼ λg + (1 − λ)h for all h ∈ F and all λ ∈ [0, 1], then for all π ∈ Π(Γ), π(u(f )) = π(u(g)). Hence the function ζ = u(f ) − u(g) satisfies π(ζ) = 0 for all π ∈ Π(Γ). Since f and g are elements of F, we can write ζ as a linear combination of P indicators of elements of a partition (Si )ni=1 of S, say ζ = ni=1 ζi ISi . Consider γ and γ˜ two measurable selections of Γ. We can suppose γ and γ˜ are unto since for all π ∈ Π(Γ), the equivalence class of functions γ such that π = P γ −1 contains unto representatives. Since we know from Lemma 4 that Sel(Γ) is decomposable, we have for all A ∈ A, P (ζ ◦ (γA˜ γ )) = 0. Hence, we have

Z

Z

0 =

ζ ◦ γ dP +

=

ζ ◦ γ˜ dP c ZA

ZA ζ ◦ γ dP + ZA =

ζ ◦ γ˜ dP − ZΩ

ζ ◦ γ dP − A

Z ζ ◦ γ˜ dP A

ζ ◦ γ˜ dP, A

so that ζ ◦ γ and ζ ◦ γ˜ are equal P-almost everywhere. If Sj is unambiguous, then there exists A ⊂ Ω with P (A) > 0 and k 6= j such that γ(A) ⊆ Aj and γ˜ (A) ⊆ Ak . Hence, ζ ◦γ = ζ ◦ γ˜ P-almost everywhere implies ζh = ζk . If Aj ∪ Ak = S, ζ is constant. Otherwise, redefine ζ on a partition of S with n − 1 elements and procede by reverse induction. ¤ 27

Proof of Proposition 3b: Follows from Lemma 3a,b and combined with Theorem 11 of Ghirardato, Maccheroni, and Marinacci (2004) ¤

Proof of Theorem 2a: 4S is a maxmin preference relation under axioms 1-5 and 6∗ (this is the main result of Gilboa and Schmeidler (1989)). Call C the weak-star closed convex set of priors in the representation; in other words, there exists a cardinal utility function uS such that for all f, g ∈ F, Z

Z

f 4S g ⇐⇒ min π∈C

uS (f ) dπ ≤ min π∈C

S

uS (f ) dπ. S

By Theorem 14 of Ghirardato, Maccheroni, and Marinacci (2004), we observe that the unambiguous preference relation ¹ of Definition 6 is represented by Bewley Unanimity with respect to C, in the sense that for all f, g ∈ F, Z Z f ¹ g ⇐⇒ uS (f ) dπ ≤ uS (f ) dπ all π ∈ C. S

(4)

S

Let ρ be the lower envelope of the set C, i.e. for all A ∈ Σ, ρ(A) = inf π(A). π∈C

We show below that Axiom 7 allows the identification of ρ as the willingness to bet of the decision maker. Consider an event A ∈ Σ. Under Axioms 1-5, the act xAy, with x ≺ y has a certainty equivalent a ˜ ∈ Y , so that xAy ∼S a ˜. Now consider the set U = {a ∈ Y | xAy º a}. Given the representation (4), it is clear that U is closed, so that we can set a ˆ = max U and we have xAy v a ˆ. Calling α the positive real number such that a ˆ = α(x − y), we have xAy v α(x − y). Given the definition of w, we have uS (x)π(A) + uS (y)(1 − π(A)) ≥ α(uS (x) − uS (y)), all π ∈ C, so that ρ(A) ≥ α, and for all η, ² such that ² ≺ η, there exists a π ∈ C such that uS (η) + uS (x)π(A) + u(y)(1 − π(A)) < uS (²) + α(uS (x) − uS (y)), and ρ(A) < α + uS (²) − uS (η). We conclude that ρ(A) = α. Hence, Axiom 7-∞ implies that ρ is an infinitely monotone capacity. In particular, this implies that it coincides with the willingness to bet of the decision maker, and therefore, that 4S is also a Choquet Expected Utility preference relation. 28

Since 4Ω satisfies Axioms 1-3, 4∗ , 5, A1 and A2, it is an Expected Utility preference relation with nonatomic and countably additive subjective probability P on (Ω, A). So Theorem 4 ensures the existence of a measurable correspondance Γ : Ω ⇒ S such that for all A ∈ Σ, ρ(A) = P ({ω ∈ Ω | Γ(ω) ⊆ A}).

(5)

The result follows now from the computation of the Choquet Expected Utility with respect to ρ that we performed in the proof of Proposition 3b. ¤

Proof of Theorem 2b: Consider an alternative analogy (Ω0 , A0 , P 0 ). Then, from (5), we have for all A ∈ Σ, P ({ω ∈ Ω | Γ(ω) ⊆ A}) = P 0 ({ω ∈ Ω0 | Γ0 (ω) ⊆ A}).

(6)

Let {γi }i∈N be a Castaing representation of Γ. Then, for all closed A, Γ(ω) ⊆ A ⇐⇒ γi (ω) ∈ A, all i ∈ N. Hence, Ã P (Γ ⊆ A) = P

\

! {γi ∈ A}

i∈N

Let τ : Ω0 → Ω be an isomorphism of measure algebras. Then ! ! Ã Ã ³ ´ \ \ 0 ˜0 0 {γi ◦ τ ∈ A} = P Γ ⊆ A {γi ∈ A} = P P i∈N

i∈N

˜ 0 is the closure of {γi ◦ τ }i∈N . So from (6), there exists an isomorphism of measure where Γ ˜ 0 ◦ τ˜. The result follows with the equality Γ ◦ τ ◦ τ˜ = Γ0 . algebra τ˜ on Ω0 such that Γ0 = Γ ¤

References Amarante, M. (2004): “States, models and unitary equivalence I: representation theorems and analogical reasoning,” unpublished manuscript. Anscombe, F., and R. Aumann (1963): “A definition of subjective probability,” Annals of Mathematical Statistics, 34, 199–205. Arrow, K. (1970): Essays in the theory of risk-bearing. North-Holland. 29

Aubin, J.-P., and H. Frankowska (1990): Set-valued analysis. Boston: Birkh¨auser. Aumann, R. (1965): “Integrals of set-valued functions,” Journal of Mathematical Analysis and Applications, 12, 1–12. Billingsley, P. (1965): Ergodic Theory and Information. New York: Wiley. Castaldo, A., F. Maccheroni, and M. Marinacci (2004): “Random sets and their distributions,” Sankhya (Series A), 66, 409–427. Choquet, G. (1954): “Theory of capacities,” Annales de l’Institut Fourier, 5, 131–295. Cohen, M., and J.-Y. Jaffray (1980): “Rational behaviour under complete ignorance,” Econometrica, 48, 1281–1299. Dempster, A. P. (1967): “Upper and lower probabilities induced by a multi-valued mapping,” Annals of Mathematical Statistics, 38, 325–339. Dempster, A. P. (1968): “A generalization of Bayesian inference,” Journal of the Royal Statistical Society. Series B, 30, 205–247. Ellsberg, D. (1961): “Risk, ambiguity and the Savage axioms,” Quaterly Journal of Economics, 75, 643–669. Gajdos, T., J.-M. Tallon, and J.-C. Vergnaud (2004): “Decision making with imprecise probabilistic information,” Journal of Mathematical Economics, 40, 647–681. Ghirardato, P. (2001): “Coping with ignorance: unforseen contingencies and nonadditive uncertainty,” Economic Theory, 17, 247–276. Ghirardato, P., F. Maccheroni, and M. Marinacci (2004): “Differentiating ambiguity and ambiguity attitude,” Journal of Economic Theory, 118, 133–173. Gilboa, I., and D. Schmeidler (1989): “Maximin expected utility with non-unique priors,” Journal of Mathematical Economics, 18, 141–153. Gilboa, I., and D. Schmeidler (2001): A theory of case-based decisions. Cambridge University Press. Hiai, F., and H. Umegaki (1977): “Integrals, conditional expectations and martingales of multivalued functions,” Journal of Multivariate Analysis, 7, 149–182. Maccheroni, F., M. Marinacci, and A. Rusticchini (2004): “Variational representation of preferences under ambiguity,” ICER working paper 5/2004. 30

Manski, C. (2005): “Partial identification in econometrics,” forthcoming in the New Palgrave Dictionary of Economics, 2nd Edition. Nehring, K. (2001): “Ambiguity in the context of probabilistic beliefs,” unpublished manuscript. Philippe, F., G. Debs, and J.-Y. Jaffray (1999): “Decision making with monotone lower probabilities of infinite order,” Mathematics of Operations Research, 24, 767–784. Ramsey, F. (1926): “Truth and probability,” in Foundations of mathematics and other logical essays. Braithwaite, R.B., New York: Humanities Press 1950. Rokhlin, V. (1949): “Selected topics from the metric theory of dynamical systems,” Uspekhi Matematicheskikh Nauk, 4, 57–128, translated in American Mathematical Society Transactions 49(1966), 171-240. Savage, L. (1954): The Foundations of statistics. New York: Wiley. Schmeidler, D. (1989): “Subjective probability and expected utility without additivity,” Econometrica, 57, 571–587.

31

Random Set Inference Introduction

measures that dominate the set function P set-wise, forming thus the core of the ...... to any countable subset Sc of S to yield the existence of π∗ and π∗ ∈ Π ...

267KB Sizes 0 Downloads 124 Views

Recommend Documents

Inference in Approximately Sparse Correlated Random ...
Jul 3, 2017 - tional mean of the unobserved heterogeneity and does not attempt to relax the probit functional form, as the former is likely the most serious ...

Projection Inference for set-identified SVARs.
Jun 30, 2016 - identifying restrictions that can be imposed by practitioners.10. Remark 2: ..... Notes: Laptop @2.4GHz IntelCore i7. Comments ... computer cluster at the University of Bonn.22 Notice that we choose M=100,000 for illustrative ...

Logarithmic components of the vacant set for random ... - Project Euclid
choice of a sufficiently large constant c0 > 0 is crucial in the definition of the ... the d-dimensional integer torus E = (Z/NZ)d for a sufficiently large dimension d and ...

PDF Introduction to Random Signals and Applied ...
... Filtering with Matlab Exercises Android, Download Introduction to Random Signals and ... updated to cover innovations in the Kalman filter algorithm and the ...

Introduction to Answer Set Programming
Carleton University. Answer Set Programming. F. Gagnon 08 ... In C++ and Java, both logic and control have to be specified: – Logic is usually the hard (and ...

Nullable Type Inference - OCaml
Dec 11, 2002 - [1] Apple (2014): Swift, a new programming language for iOS and. OS X. Available at https://developer.apple.com/swift. [2] Facebook (2014): ...

inference-progressions-teaching - CensusAtSchool
Reinforcing & developing ANALYSIS statements. - Comparative descriptions of sample distributions. → always use variable, value, unit. → centres (medians), shift/overlap (position of middle 50% relative to each other), spread. (IQR – consistency

Variational Program Inference - arXiv
If over the course of an execution path x of ... course limitations on what the generated program can do. .... command with a prior probability distribution PC , the.

Variational Program Inference - arXiv
reports P(e|x) as the product of all calls to a function: .... Evaluating a Guide Program by Free Energy ... We call the quantity we are averaging the one-run free.

Nullable Type Inference - OCaml
Dec 11, 2002 - Imperative programming languages, such as C or Java deriva- tives, make abundant ... In languages using the ML type discipline, the option type type α option ..... //docs.hhvm.com/manual/en/hack.nullable.php. [3] Facebook ...

inference-progressions-teaching - CensusAtSchool
show up consistently if we repeatedly sampled. → introduce ... Don't “POP in your PEE” – student ANALYSIS descriptions (PEE) often include the population.

random walks, disconnection and random interlacements
Doctor of Sciences .... covered disk roughly behaves like n1/4 in two dimensions. This be- havior differs radically from that of the largest covered disk centered.

pdf-12115\probability-random-variables-and-random-signal ...
... of the apps below to open or edit this item. pdf-12115\probability-random-variables-and-random-sig ... daptation-by-bertram-emil-shi-peyton-z-peebles-jr.pdf.

random-prices.pdf
share a common point of discontinuity and their intersection does not determine the. price of housing. An interval of prices clears the ex post market, including ...

Memory in Inference
the continuity of the inference, e.g. when I look out of the window at a bird while thinking through a problem, but this should not blind us to the existence of clear cases of both continuous and interrupted inferences. Once an inference has been int

This is a placeholder. Final title will be filled later. - Random Set Filtering
filter to each target and use a data association technique to assign the correct ... requires derivation of Jacobian matrices which in many cases is non-trivial.

Distributed Random Walks
Random walks play a central role in computer science, spanning a wide range of areas in ..... u uniformly at random and forwards C to u after incrementing the counter on the coupon to i. ...... IEEE Computer Society, Washington, DC, 218–223.

POSITIVE DEFINITE RANDOM MATRICES
We will write the complement of α in N as c α . For any integers i .... distribution with n degrees of freedom and covariance matrix , and write . It is easy to see that ...

random process
Dec 3, 2007 - Course Code ... filtering of a random process, power spectral density, cross spectral ... Power spectral density of a continuous-time process.

Inference Task Cards.pdf
The children sat at their desks and looked. down at their hands. No one made a. sound. The teacher looked around the. classroom. Then she saw the pieces on the. floor. What happened? How do you know? www.elementaryteacherfiles.blogspot.com. Inference

HUDM4122 Probability and Statistical Inference
Apr 15, 2015 - The clinical observation scale goes from 0 to 10, with any score below 3 considered evidence for appropriate behavior. • The average clincal observation score in your sample is 3.2 points, with a standard deviation of 4 points. • W

Inference Task Cards.pdf
... saved all her money. It was exactly. what she wanted. She imagined gliding down. the road pedaling effortlessly. She finally. had enough money to make her ...

HUDM4122 Probability and Statistical Inference
Apr 15, 2015 - Compute a two-tailed Z test to find out whether the difference between .... And I'm happy to go along with that practice to spare you lots of ...