September 24, 2013 8:49 WSPC/118-IJUFKS ...

Viewer
Transcript

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems Vol. 21, No. 5 (2013) 645–673 c World Scientiﬁc Publishing Company DOI: 10.1142/S0218488513500311

HOW TO RANDOMLY GENERATE MASS FUNCTIONS

THOMAS BURGER iRTSV – FR3425 (CNRS, CEA, INSERM, UJF), Laboratoire de Biologie ` a Grande Echelle, iRTSV, CEA de Grenoble – Bˆ at. C3, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France [email protected] http: // sites. google. com/ site/ thomasburgerswebpage/ ´ SEBASTIEN DESTERCKE Heudiasyc – UMR 7253 (CNRS, UTC), Bureau A 113, Universit´ e de Technologie de Compi` egne, Centre de Recherches de Royallieu, F-60205 Compiegne Cedex, France [email protected] https: // www. hds. utc. fr/ ~ { }sdesterc/ dokuwiki/ doku. php Received 7 August 2012 Revised 25 April 2013 As Dempster–Shafer theory spreads in diﬀerent application ﬁelds, and as mass functions are involved in more and more complex systems, the need for algorithms randomly generating mass functions arises. Such algorithms can be used, for instance, to evaluate some statistical properties or to simulate the uncertainty in some systems (e.g., data base content, training sets). As such random generation is often perceived as secondary, most of the proposed algorithms use straightforward procedures whose sample statistical properties can be diﬃcult to characterize. Thus, although such algorithms produce randomly generated mass functions, they do not always produce what could be expected from them (for example, uniform sampling in the set of all possible mass functions). In this paper, we brieﬂy review some well-known algorithms, explaining why their statistical properties are hard to characterize. We then provide relatively simple algorithms and procedures to perform eﬃcient random generation of mass functions whose sampling properties are controlled. Keywords: Dempster–Shafer theory; simulation algorithm; random generation.

1. Introduction Sampling and simulation techniques are essential practical tools of uncertainty theories. For instance, when analytical calculations cannot be performed (because of practical or theoretical limitations), Monte-Carlo methods are particularly eﬃcient to provide stochastic estimations of quantities of interest. In the probabilistic setting, simulating and sampling uncertainty models is an old problem for which a 645

September 24, 2013 8:49 WSPC/118-IJUFKS

646

S0218488513500311

T. Burger & S. Destercke

large quantity of algorithms have been proposed and whose statistical properties are well-known. However, in the context of so-called imprecise probability theories, that is uncertainty theories where incompleteness and imprecision are explicitly integrated in the uncertainty models, building sound and eﬃcient simulation techniques remains an open question often considered as secondary. This is due to the fact that, in a given study, simulation is more often a needed tool than the central topic. In this paper, we concentrate on the question of simulating and sampling a particular type of imprecise probability model: belief functions, or equivalently mass functions. Note that this problem of simulating and generating mass functions, i.e. using probabilistic tools to generate distributions representing uncertainty under the formalism of belief functions, is diﬀerent from using sampling schemes (such as Monte-Carlo) to approximately estimate deterministic quantities (belief or plausibility degree, combination results) from known mass functions. The ﬁrst problem, tackled in this article, is seldom treated in the literature, while the second one has received much more attention. Readers interested in the latter problem can check Wilson’s comprehensive survey1 (and more precisely the following surveyed articles,2–4 which focus on Monte-Carlo methods), or other related articles.5,6 Randomly generating mass is useful in many settings. The main situations where such a random generation may be needed are the following: (a) the statistical behavior of some functions (information measures,7 distances,8 conﬂict,9 etc.) summarizing some information about mass functions has to be tested. In this case, it is necessary to perform a uniform sampling over all mass functions; (b) depending on the application, mass functions can be assumed to have some speciﬁc form, i.e. an expert providing consonant mass functions,10 a classiﬁer providing or using simple mass assignments?11 (e.g., 2-additive, simple support, . . . ), a sensor providing Bayesian mass function, etc. In this case, the restriction describes a subset of the set of all mass functions, and it is necessary to sample uniformly in this subset; (c) mass functions can be of general form, but they are expected to follow some tendency while still being pervaded with randomness. This can happen, for instance, when one wants to simulate a training set of data with uncertain labels from a training set with known labels. In this situation, a sampling procedure should (on the long run) give higher masses to focal elements containing the true label but still allow completely wrong mass functions to be sampled with a lower probability. This situation often happens when testing some data fusion methods or learning algorithms.12 Outside these main cases, there may be other cases where simulations are useful but where the need to control the statistical behavior of the random generation is less critical but still useful, such as the empirical test of some conjectures prior to demonstrating them.13

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

647

The rest of this introduction contains the needed notations and basics about belief functions used in this paper, as well as a short review of speciﬁc mass assignments that are most likely to be encountered in applications (hence the most likely to be simulated). 1.1. Basics on mass and belief functions Let Ω be a ﬁnite set of elements called the frame of discernment, or for short, the frame. We denote by P(Ω) its power set. A mass function on Ω is a positive mapping m : P(Ω) → [0, 1] such that A∈P(Ω) m(A) = 1. We denote by MΩ the set of mass functions deﬁned on Ω. Classically, mass functions are distinguished according to their support, the support Fm of a mass function m being the set of all its focal elements. A focal element is a subset A ⊆ Ω such that m(A) > 0. Hence, the support Fm of a mass function m is the set of all the subsets A of Ω such that m(A) > 0. In this article, the number of focal elements (i.e. |Fm |, the cardinality of the support) is noted N , therefore 2 ≤ N ≤ 2|Ω| (as sampling a mass function with one focal element is trivial). From the mass function m can be obtained several set functions, such as the plausibility P l and belief Bel measures deﬁned for any A ⊆ Ω as m(E) (1) P l(A) = E∩A=∅,E⊆Ω

Bel(A) =

m(E).

(2)

E⊆A,E=∅

These set-functions are usually used to measure our (imprecise) conﬁdence in the fact that event A will happen. Dempster–Shafer Theory (or DST for shorts, or the Theory of Evidence14 ) provides reasoning tools to manipulate information represented as mass functions (or set-functions induced by these masses). This theory has several interpretations, such as the Transferable Belief Model,15 the Shenoy– Shafer Architecture,12 the Theory of Hints,16 etc. Depending on the interpretation, the notion of mass function may have a diﬀerent meaning: it can be seen as a ﬂoating probability, a convex family of probability distributions, a random set, etc. However, in all these cases, the frame Ω classically represents a set of (mutually exclusive) hypotheses, and mass functions on Ω are used to encode uncertain pieces of knowledge regarding to this set of hypotheses. Moreover, whatever the interpretation, the properties of mass functions make them handy representation to reason under uncertainty. One of this very useful properties, the one we are interested in, is the easiness with which they can be simulated. As this property is not dependent on the interpretation, we focus on the mathematical object, i.e. a function from P(Ω) to [0, 1], rather than its interpretation as uncertainty representation.

September 24, 2013 8:49 WSPC/118-IJUFKS

648

S0218488513500311

T. Burger & S. Destercke

The problem of simulating mass functions is equivalent to that of randomly sampling elements out of MΩ , according to a particular distribution D. In theory, D could be any distribution, but in practice such generality is seldom required, and we have already mentioned the main situations that are likely to happen. Checking the behavior of some function or concentrating on speciﬁc forms of mass functions (resp. Situations a and b) can be handled by uniform sampling over regions of MΩ , while following some tendency (Situation c) will typically require a non-uniform sampling. The purpose of this work is to provide generic algorithms simple enough to be of practical interest yet ﬂexible enough so that they can cover a wide range of situations, and whose long-run behavior can be controlled. Of course, it is always possible that some applications will have speciﬁc needs that go beyond the possibilities of the presented algorithms, yet we think this will seldom happens. 1.2. Short review of speciﬁc mass functions Generic mass functions can be quite complex representations, requiring to store 2|Ω| diﬀerent values where |Ω| is the cardinality of Ω. Mass functions encountered in practice are often simpler, because provided information (by expert, statistical sampling, . . . ) will often have such a simpler form, or because computational tractability is an issue in the considered application. Here, we recall the main forms of simpliﬁed mass functions deﬁned by some restriction applied to their support. A mass function m is said to be: • k-additive 17 if the cardinality of focal elements is less than or equal to k, i.e., ∀A ∈ Fm , |A| ≤ k; Moreover, there must be at least a focal element with cardinality exactly k. • k-intolerant if the cardinality of focal elements is higher than k, i.e., ∀A ∈ Fm , |A| > k; • Dogmatic if Ω is not a focal element. A dogmatic mass is a k-additive mass function with k ≤ (|Ω| − 1); • Normalized if ∅ is not a focal element; • Bayesian if it is 1-additive and normalized (only singletons get a positive mass). Bayesian mass functions are equivalent to probability distributions; • Linear-vacuous if only singletons and Ω are focal elements; • Consonant if focal elements are nested, i.e., if A, B ∈ Fm , then A ⊆ B or B ⊆ A. Such mass functions induces plausibility and belief measures that are equivalent to possibility and necessity measures respectively; • Consistent if there is an element ω ∈ Ω that is in every focal element, ∃ω ∈ Ω/A ∈ Fm ⇒ ω ∈ A; • Categorical if there is a single focal element E such that m(E) = 1. Categorical mass functions corresponds to classical sets; • Vacuous if m(Ω) = 1. This mass function models ignorance, and is a speciﬁc case of categorical mass function;

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

649

• Simple support if there is a focal element E with m(E) = 1 − α and m(Ω) = α. Simple support mass functions are particular consonant mass functions, and play an essential role in the Theory of Evidence developed by Shafer.14 They correspond to the assessment Bel(E) = 1 − α. • Dichotomous 18 if the only possible focal elements of m are A ⊆ Ω, Ω \ A and Ω. Dichotomous mass functions are never consistent and dogmatic. They are particular cases of recently introduced clouds19 and can be seen as a direct extension of simple support functions. In the sequel, we will point out when an algorithm (or a family of algorithms) can be used to simulate one of those speciﬁc mass. This paper signiﬁcantly extends and improves the ﬁrst few algorithms presented in Ref. 20. Section 2 is a state-of-the-art, where we describe algorithms of the literature and we focus on explaining why the most intuitive algorithm to uniformly sample MΩ (the space of all masses) is not adapted. In Sec. 3, we provide the mathematical background necessary to derive adapted simulation algorithms. A ﬁrst algorithm of uniform sampling over MΩ is presented, illustrated and justiﬁed in Sec. 4, in order to prepare for more general developments. Finally, a set of various algorithms adapted to various types of mass functions are provided in Sec. 5. 2. State of the Art In practice, the most intuitive algorithm is based on the following idea (see Algorithm 1): (1) Deﬁne the number of focal element N as the size of the powerset of Ω (if the mass is normalized, then the size should be reduced by 1), (2) uniformly and independently sample N values in [0, 1] corresponding to the N focal elements, and (3) perform a normalization (by dividing these values by their sum) to enforce the constraint A∈P(Ω) m(A) = 1. It is often used13,21 because of its simplicity. Algorithm 1: A classical algorithm to randomly generate a mass function. Input: frame Ω Output: random mass function m 1 F ← definePowerSetOf(Ω); 2 N ← sizeOf(F ); 3 foreach i ≤ N do 4 mi ← randomlySample(U([0, 1])); m 1 m2 |N | , F , . . . , F ; 5 m← F1 , mm , , 2 |N | m m k k k k

k

k

The main problem with Algorithm 1 is that the distribution D it generates on MΩ is diﬃcult to characterize. At ﬁrst sight, one may think that it generates a vector of uniform distribution, however this is not what happens, for the following reason: Although (at Line 4) values are independently sampled from U([0, 1]) (i.e.

September 24, 2013 8:49 WSPC/118-IJUFKS

650

S0218488513500311

T. Burger & S. Destercke

the uniform law on [0, 1]), they are normalized afterward (Line 5). As the normalization of each value involves all other values, the focal elements cannot be considered as realizations of independent random variables any more (which is necessary for a vector to be uniformly distributed). As a matter of fact, this random vector obeys an unspeciﬁed (and non-uniform) law D. This can be experimentally illustrated: as the distribution of mass values is the same for each focal element (the focal elements are identically distributed), it is possible to picture the distribution of the whole vector (i.e. the mass function) by representing only one marginal distribution (i.e., the mass of one focal element). Figure 1 histograms provide an approximation of the distribution of a focal element with 1000 samples (performed according to Algorithm 1 for increasing sizes of Ω). Roughly speaking, it appears that the distribution tends to the uniform distribution 2 2 and to the null distribution on |P(Ω)|−1 , 1 when |Ω| → ∞. on 0, |P(Ω)|−1 In other words, when the size of the frame increases, the support of the mass distribution of each focal element get smaller and smaller. It shows that masses are not uniformly generated: unbalanced mass functions (i.e. with focal elements receiving very diﬀerent assignments) are very unlikely, while roughly uniform mass functions (i.e. all N focal elements having a mass close to N1 ) are very likely. Thus, Algorithm 1 does not produce a uniform sampling over MΩ as distribution D gives higher probability to uniform masses. Finally, Algorithm 1 samples according to a distribution which (1) is not controlled, leading to unexpected statistical bevahior, and (2) cannot be tuned by a set of parameters, so that the behavior of the sampling ﬁts a wide variety of situations. Another state of the art algorithm is that of Tessem22 and Bauer.23 This algorithm was not explicitly built to provide a uniform sampling, but according to Tessem, rather to provide mass functions with unbalanced assignments; and according to Bauer, rather to sample mass functions close to the ones met in applications. This algorithm involves 2 random variables X and Y with identical exponential distributions. Once X is generated, one considers the random quantity Z = P(Y ≤ X), where P denotes the probability measure. It is well known that the cumulative distribution function FX of X is FX (X) ∼ U([0, 1]), whatever the distribution of X (this property being crucial to the inversion sampling method or in the Monte-Carlo method ). Bauer’s and Tessem’s algorithm is therefore equivalent to Algorithm 2, which iteratively assigns mass to focal elements according to a uniform sample, scaled by the total mass that remains to be assigned. Algorithm 2 is interesting to validate a method to approximate a belief function with unbalanced assignments, as Tessem used it. However, the notion of “real life” mass functions mentioned by Bauer seems arguable, as each application can deal with diﬀerent mass functions and as a mass function is a mathematical model not directly observable. Moreover, Algorithm 2 samples from a distribution that is quite diﬃcult to characterize, and modifying it to ﬁt a particular distribution or speciﬁc needs seems even more diﬃcult than for Algorithm 1.

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

Fig. 1.

651

Distribution of a mass assignment according to Algorithm 1 for increasing size of Ω.

So, although Algorithms 1 and 2 achieve random sampling on MΩ , they share the common shortcomings that their statistical properties are ill-deﬁned and that adapting their behavior according to the problem of interest seems diﬃcult.

September 24, 2013 8:49 WSPC/118-IJUFKS

652

S0218488513500311

T. Burger & S. Destercke

Algorithm 2: A simple algorithm equivalent to Bauer’s one. Input: frame Ω Output: random mass function m 1 F ← definePowerSetOf(Ω); 2 N ← sizeOf(F ); 3 Rest ← 1 ; 4 foreach i ≤ N − 1 do 5 mi ← Rest × randomlySample(U([0, 1])); 6 Rest ← Rest −mi ; 7

m|N | ← Rest ;

3. Mathematical Tools In this section, we provide the necessary mathematical insights to develop correct sampling algorithms (most of the results which are not demonstrated can be found in a detailed version in the textbook24 ). We ﬁrst explain (Sec. 3.1) that categorical distributions and mass functions are similar objects, as both of them are deﬁned by a set of N values in [0, 1] adding up to one (a point in the N -unit simplex). This allows us to use well-known results of Bayesian statistics and probabilities concerning categorical distributions. More precisely, we recall in Sec. 3.2 that the Dirichlet family can be used to randomly sample such categorical distributions (and therefore elements of MΩ ). As sampling a random vector according to a Dirichlet distribution is not straightforward, we recall in Sec. 3.3 the link existing between Dirichlet distributions and exponential and gamma distributions to derive simple algorithms to sample from MΩ . 3.1. The categorical family and MΩ The N -way categorical distribution is a discrete probability distribution that describes the result of a random event that can take on one of N exhaustive and exclusive outcomes O1 , . . . ON with probabilities p1 , . . . , pN . When N = 2 the categorical distribution is the well-known Bernouilli distribution. As well as the binomial distribution is the probability of the number of success among n Bernouilli trials, the multinomial distribution is the probability distribution that describes the repartition amongst categories O1 , . . . ON of n categorical trials. A categorical distribution is completely deﬁned by the probability pi = P(X = Oi ) of each outcome Oi , with N i=1 pi = 1 and pi ≥ 0. Thus, the vector (p1 , . . . , pN ) that identiﬁes a categorical distribution lives in a (N −1)-dimensional simplex, noted CN , and called the N -way categorical family. Remind that a mass function is such that A∈P(Ω) m(A) = 1, m(A) ≥ 0 and we can associate these constraints to those of (p1 , . . . , pN ), as show the next example and theorem.

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

653

Example 1. Let us consider the frame Ω = {ω1 , ω2 , ω3 }, and a mass function m ∈ MΩ such that m({ω1 }) = 0.2, m({ω2 }) = 0.3 and m({ω1 , ω3 }) = 0.5. A compact storing which uniquely identiﬁes m ∈ MΩ is {0, 0.2, 0.3, 0, 0, 0.5, 0, 0}, i.e. a vector containing the 23 mass assignments required to deﬁned m. Now, let us consider a set of 8 outcomes O1 , . . . O8 and a 8-way categorical distribution over these outcomes. Such a categorical distribution can also be identiﬁed with a vector (p1 , . . . , p8 ). More precisely, the distribution such that outcomes O2 , O3 and O6 have probabilities of 0.2, 0.3 and 0.5, respectively, has the following representation: {0, 0.2, 0.3, 0, 0, 0.5, 0, 0} which corresponds to that of m ∈ MΩ . Theorem 1. MΩ is isomorphic to C(2|Ω| ) . Proof. This simple theorem is central to understand the random generation of mass functions. We propose two distinct proofs, each shedding a particular light on the related sampling problem. The ﬁrst proof shows that elements of MΩ and elements of C(2|Ω| ) are in one-toone correspondence: Let ωi be the ith element of Ω. A well known compact represen

tation of any subset A of Ω is the binary number i δωi ∈A where δ is the Kronecker symbol, and where ∧ is the concatenation operator. Thus, a compact notation for a mass function m on Ω is an ordered list {p1 , . . . , p2|Ω| } of 2|Ω| values such that the jth value pj is m(Aj ), and Aj is the subset whose binary representation is the binary 2|Ω| −1 version of the decimal number j. Moreover, as m(A2|Ω| ) = 1 − =1 m(A ), any mass function m is uniquely identiﬁed by {p1 , . . . , p2|Ω| −1 }, which uniquely deﬁnes a (2|Ω| )-way categorical distribution. Thus, any mass function m on Ω is equivalent to a (2|Ω| )-way categorical distribution. As a consequence, MΩ is isomorphic to C|P(Ω)| = C(2|Ω| ) . The second demonstration relies on more geometric arguments: C(2|Ω| ) is known to be the standard 2|Ω| − 1 simplex, and Cuzzolin25 established in his geometrical interpretation of Dempster–Shafer theory that MΩ is also the standard 2|Ω| − 1 simplex. Thus, picking up a mass function m out of MΩ by means of a uniform sampling, is equivalent to pick up at random a (2|Ω| )-way categorical distribution out of C(2|Ω| ) according to a uniform distribution on C(2|Ω| ) . To us, this point is particularly interesting, as the distribution of all the categorical distributions in C(2|Ω| ) is a well known mathematical object in the Bayesian community, named the Dirichlet distribution. This structural similarity between Dirichlet models and belief functions is mentioned in other works,26 which are also consistent with the simulation problem discussed here.

September 24, 2013 8:49 WSPC/118-IJUFKS

654

S0218488513500311

T. Burger & S. Destercke

3.2. The Dirichlet distribution The Dirichlet law Dir(π1 , . . . , πN ) of order N ≥ 2 with parameter π = (π1 , . . . , πN ) ∈ [0, 1]N has the following probability density function CN → [0, 1] N −1

Γ(π0 ) xiπi −1 (3) x → fDir (x; π) = N i=1 Γ(πi ) i=1 N −1 N where x = {x1 , . . . , xN } ∈ CN with xN = 1 − i=1 xi , π0 = i=1 πi and Γ is the ∞ gamma function (Γ(z) = 0 tz−1 e−t dt for any complex number z with a strictly positive real part). fDir is a probability density over CN and therefore over the set of all the N -way categorical distributions. This is why it is called the conjugate prior of the categorical distributions. Hence, a trial according to the Dirichlet distribution results in a N -way categorical distribution. It is then straightforward to associate a trial result to a mass function chosen from MΩ . The deﬁnition of a Dirichlet distribution depends on a vector of parameters π = (π1 , . . . , πN ) ∈ [0, 1]N and these parameters will determine the probability density function over CN (the law according to which categorical distributions will be drawn). We recall two properties about π that will be instrumental in the sequel: Property 1. Let X be a random vector following a Dirichlet distribution of parameter π = (π1 , . . . , πN ) ∈ [0, 1]N (or, X ∼ Dir(π1 , . . . , πN ) for short). We have: π1 πi πN ,···, ,···, , E[X] = π0 π0 π0 where E[X] is the expectation of (3). In other words, the expected mass assigned to the ith focal element of mass functions sampled from C(2|Ω| ) according to a Dirichlet distribution of parameter |Ω|

π = (π1 , . . . , π2|Ω| ) ∈ [0, 1]2 will be πi/ 2j=1 πj . This property will be useful to set the mean behavior of simulated masses. The next property characterizes more precisely the distribution of the masses of each focal elements: |Ω|

Property 2. Let X ∼ Dir(π1 , . . . , πN ), and let Xi be the ith component of X. Then, the bivariate vector (Xi , 1 − Xi ) follows a Dirichlet law of order 2 and of parameters (πi , π0 − πi ): (Xi , 1 − Xi ) ∼ Dir(πi , π0 − πi ) A Dirichlet distribution of order 2 (i.e. N = 2) is also called a Beta distribution and it is the conjugate prior of a categorical distribution with only two outcomes, i.e. a Bernouilli distribution. Thus, if the random vector X follows a Dirichlet distribution and its outcome is interpreted as the random generation of a mass function m, then, the outcome of (Xi , 1 − Xi ) corresponds to the generation of the two following assignments: m(Ai ) and 1 − m(Ai ).

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

655

This result is particularly interesting: it means that the marginal distribution of a Dirichlet distribution is still a Dirichlet distribution, and that the distribution of the mass assigned to any focal element of a randomly generated mass function follows a beta distribution. As a consequence, whatever the cardinality of the frame Ω, it is possible to display and analyze a distribution of mass functions over MΩ by considering the distribution of each focal element (or a single of them, if they are i.i.d.). This will be instrumental in Sec. 4, where we compare the histogram derived from several random generations to the theoretical distribution. If a mass assignment m is sampled uniformly from MΩ , then every focal element of m should have the same expected mass. Indeed, the following property Property 3. In the particular case where π = 1k (i.e. πi = 1 ∀i), the Dirichlet distribution is equivalent to the uniform distribution on Ck leads us to the following theorem, which deﬁnes the uniform probability on the set of mass functions: Theorem 2. The uniform distribution on MΩ is given by the Dirichlet distribution of order 2|Ω| and of parameters πi = 1 ∀i ∈ (1, . . . , 2|Ω| ). Proof. Direct consequence of the application of Th. 1, followed by application of Prop. 3. 3.3. Links with the gamma distribution So far, the links between sampling over MΩ and random generation according to the Dirichlet distribution is clear. Thus, from a theoretical point of view, we know how to uniformly generate mass functions. However, in practice, Dirichlet law is not straightforward to deal with. This is why, we recall here the link between the Dirichlet distribution and the very classical gamma distribution. The gamma distribution Gamma(α, β) with shape parameter α ∈ R∗+ and scale parameter β ∈ R∗+ has the following probability density function: R+ → [0, 1] x → fGamma (x; α, β) =

β α α−1 −βx x e Γ(α)

Let us recall that the gamma distribution can be seen as a generalization of the exponential distribution, as Gamma(α = 1, β) = Exp(β). The link between the gamma and the Dirichlet distributions is given by the following property: Property 4. Let X1 , . . . , XN be N independent random variables such that Xi ∼ Gamma(πi , β). The random vector Y = ( kX1 X , . . . , kXi X , . . . , kXN X ) j=1

j

j=1

j

j=1

j

follows a Dirichlet law of parameters (π1 , . . . , πN ). Then, by generating independent values with gamma distributions of same scale parameter, it is possible to generate a vector according to a Dirichlet distribution.

September 24, 2013 8:49 WSPC/118-IJUFKS

656

S0218488513500311

T. Burger & S. Destercke

4. Sampling on MΩ We ﬁrst deal with the problem of sampling over the whole MΩ , before dealing with more speciﬁc cases. Algorithm 3: Algorithm to uniformly sample a mass function on MΩ . Input: frame Ω Output: random mass function m 1 F ← definePowerSetOf(Ω); 2 N ← sizeOf(F ); 3 foreach i ≤ N do 4 mi ← randomlySample(Exp(1)); m|N | m2 1 5 m← F1 , mm , , , F , . . . , F ; 2 |N | mk mk k k

k

k

4.1. The algorithm Generating mass functions with uniform distribution with N focal elements comes down to generate a random vector Y ∼ Dir(1N ), where 1N is the vector (1, . . . , 1) of length N . A simple mean to do that is to generate N i.i.d. Gamma(1, 1)distributed random variables, or, equivalently to generate N i.i.d. exponential law Exp(β = 1), a function most coding languages possess. For instance, in R (http://cran.r-project.org) the code for such a generation is: > rexp(N, rate = 1) provides such N independent random simulations. Finally, this leads us to Algorithm 3, which is very close to Algorithm 1. Actually, we can also interpret Algorithm 3 as sampling a mixture of extreme uncertainty models that are the categorical mass functions. Indeed, MΩ is a special kind of polytope or convex set (a simplex) whose vertices are the categorical masses m(Fi ) = 1. Sampling a point in this polytope then comes down to sample a convex mixture of its vertices. In Algorithm 3, this mixture is the one given by Line 5. Such a view is related to other works concerning more general models of uncertainty,27 where the identiﬁcation of vertices satisfying some constraint is more complex (here, our vertices correspond to extreme belief functions). 4.2. Description of the expected marginal distribution Before testing Algorithm 3 against Algorithm 1, let us try to picture what is the expected distribution it provides. Here again, the mass assignments of the focal elements are identically distributed so that it is possible to picture the mass distribution using a single focal element distribution. To understand the behavior of the Dirichlet distribution when the number N of focal elements increases, we can

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

657

link it with the geometry of simplex CN . In the case of a uniform sampling, the integral of the probability density function over any subset S of CN is equal to the proportion of AS (the hypervolume of S) on ACN (the hypervolume of CN ): 1 × fDir (x, 1N )dS AS = ACN S with

ACN =

CN

1Ck dCN

For sake of clarity, let us expend the integral over S or CN : Let q < N be the dimensionality of S, and let x1 , . . . , xq be the corresponding q variables (S lives in Rq ). Let xq+1 , . . . , xN be the remaining variables to deﬁne a point in CN . Then, we have: 1 × ... fDir (x, 1N )dxq . . . dx1 AS = ACN x1 xq ACN = ... 1CN dxN . . . dx1 x1

xN

Practically, when representing the marginal probability density function of a focal element, one represents 1 × ... fDir (x, 1N )dxN −1 . . . dx1 ACN x1 xN −1 as a function of xN . Hence, the histograms provided by a uniform simulation are also staircase approximations of the marginal integral of the simplex CN for various values of N . A geometrical illustration is provided in Fig. 2 that pictures a 3D tetrahedron (corresponding to C4 ). The surface of the slice of the tetrahedron,

Fig. 2. Representation of the simplex C4 (a tetrahedron), and the corresponding marginal integral.

September 24, 2013 8:49 WSPC/118-IJUFKS

658

S0218488513500311

T. Burger & S. Destercke

displayed as a function of the height of the slice corresponds to a beta distribution. The parameters of this distribution are directly linked to the dimensionality of the tetrahedron. Precisely, we have π = (1, N − 1) or, in the case of C4 , π = (1, 3). The higher the number of dimensions, the faster the surface decreases as the height of the slice increases (the angles of the simplex are narrowed). This is illustrated on the shapes of the distributions of Fig. 3, and which correspond to the theoretical limits of histograms we can expect when using Algorithm 3 (or when sampling uniformly in MΩ ). 4.3. Experimental assessment Figure 4 displays histograms obtained by a repeated application of Algorithm 3 for various values of |Ω|. Obviously, the distributions of Figs. 3 and 4 concur, giving an experimental conﬁrmation that Algorithm 3 behaves as expected, and that these histograms ﬁt the density of a beta distribution of parameter (1, 2|Ω| − 2). On the

Fig. 3. Theoretical distributions of random variables of Beta(1, 2|Ω| − 2) laws, for various values of Ω, which correspond to the marginal uniform distributions over C2| Ω|−1 . Let us remark that normalized masses (i.e. with m(∅) = 0), lives in C2| Ω|−1 instead of C2| Ω| . This loss of 1 dimension is convenient for graphical representations.

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

659

Fig. 4. Empirical distribution (approximated by histograms) of a normalized mass assignment according to Algorithm 3 for increasing size of Ω. They completely concur with the expected distribution displayed in Fig. 3.

other hand, as expected Fig. 1 does not correspond to the theoretical distributions depicted in Fig. 3. 4.4. An alternative algorithm based on Poisson process Algorithm 4 is an alternative to Algorithm 3. The intuition behind this algorithma is the following: When simulating according to a N -order categorical distribution, one simply uses a partitionb of cardinality N over the unit segment, the ith element of this partition being of length equal to P(Oi ) = {[0, y1 ], [y1 , y2 ], . . . , [yN −1 , 1]}. Thus, an intuitive way to produce this partition is to randomly pick up N −1 values, to sort them according to the increasing order, and then, to compute consecutive pairwise diﬀerences (Algorithm 4). In fact, if these values are picked up according to a uniform distribution, it can be established that the resulting categorical distribution is uniformly distributed on a Proposed

b As

by an anonymous reviewer of the short version of this article.20 points have a null Lebesgue measure, closed or opened intervals are equivalent.

September 24, 2013 8:49 WSPC/118-IJUFKS

660

S0218488513500311

T. Burger & S. Destercke

Algorithm 4: Poisson-inspired algorithm for uniform sampling. Input: frame Ω Output: random mass function m 1 F ← generatePowerSetof(Ω); 2 N ← sizeOf(F ); 3 foreach 1 ≤ i ≤ N − 1 do 4 xi ← randomlySample(U([0, 1])); 5 6 7 8

{y1 , . . . , yN −1 } ← sortIncreasingOrder({x1, . . . , xN −1 }); y0 ← 0, yN ← 1; foreach 1 ≤ i ≤ N do m(Fi ) ← yi − yi−1 ;

Ck . To picture this, let us recall that a Poisson process is a continuous time random process, the purpose of which is to count the occurrences of events, and which have these three equivalent properties (any of them implying the two others): (1) During any interval of time, the number of events follows a Poisson law with parameter × λ, where is the duration of the interval, and λ a constant characterizing the intensity of the events, (2) whatever the interval of time, the events are uniformly distributed on it, and (3) the time between two consecutive events follows an exponential law, with parameter λ. Let us focus on the last two properties. Putting them together, a strong link between exponential sampling and uniform sampling appears: the diﬀerence between two consecutive values obtained according to a set of uniform trials is exponentially distributed. On the other hand, exponential sampling is exactly what is used in Algorithm 3 to sample according to Dirichlet law. Thus, generating a Dirichlet trial (with parameter π = {1, . . . , 1}) from a set of uniform trials makes sense. In spite of comparable computation complexity, our algorithm has a clear advantage over Algorithm 4: It can be straightforwardly extended to non-uniform sampling (and with a very small increase of the complexity), while Algorithm 4 cannot. Thus, compared to Algorithms 1 and 2, Algorithm 4 has the advantage of having a well-deﬁned statistical behavior, but can be hard to adapt to diﬀerent situations. 4.5. Non-uniform sampling on MΩ If some focal elements should receive, on the long run, more mass than others (e.g., to create noisy masses centered on a true known value), then a non-uniform sampling is needed. In this case, the random generation should be performed according to Dir(π1 , . . . , πN ) where (π1 , . . . , πN ) = 1N . By Property 4, this can be achieved straightforwardly by replacing exponential sampling by gamma sampling at Line 4 of Algorithm 3, leading to Algorithm 5.

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

661

As with exponential distribution, the generation of gamma-distributed random variables is straightforward using a high level language such as R: > rgamma(k, shape = 0.5, scale = 1);

Algorithm 5: Algorithm to non-uniformly sample a mass function. Input: frame Ω, vector of parameters of Dirichlet law π Output: random mass function m 1 F ← definePowerSetOf(Ω); 2 N ← sizeOf(F ); 3 foreach i ≤ N do 4 mi ← randomlySample(Gamma(πi, 1)); m 1 m2 |N | 5 m← F1 , mm , , , F , . . . , F ; 2 |N | m m k k k k

k

k

Algorithm 5 is extremely similar to Algorithm 3, and ﬁnally, there is not much diﬀerence between uniform and non-uniform sampling according to Dirichlet law. Hence, in the sequel of this article, we do not make the diﬀerence any more between uniform and non-uniform simulation: we simply consider the general case of non-uniform sampling, uniform sampling being retrieved when gamma laws have identical parameters. A last important question is the ﬂexibility or generality of this algorithm: we propose to use only the Dirichlet family to sample CN in a non-uniform way, but obviously, there is an inﬁnite number of distributions on CN which do not belong to the Dirichlet family. Hence, we could expect that some situations do not fall in the cases covered by Algorithm 5. As a matter of fact, Dirichlet family is really ﬂexible, leading to an important set of distributions that are very likely to be suﬃcient for most of the applications. To illustrate that, let us consider the variety of marginal distributions for a single focal element. According to Property 2, the ith focal element is distributed according to Beta(π, π0 − πi ), π > 0, π0 − πi > 0. Figure 5 (Left) displays a set of beta distributions with various parameters. It appears that there is a huge variety of shapes. Multiplied by the number of focal elements, it leads to a great variety of mass proﬁles. Moreover, it is possible to use mixtures of Dirichlet modelsc to design yet other proﬁles. The next example illustrates when such mixtures can be useful. c Sampling

a mixture of models is very simple and does not modify Algorithm 5. It only modiﬁes the function randomlySample, which now receives more parameters: The M sets of parameters of the M components of the mixture, as well as the M mixing proportions (adding up to one). The function should work as follow: ﬁrst, the unit interval is partitioned into M intervals of sizes given by the mixing proportions. Then a uniform trial is conducted in order to randomly chose the component of the mixture. Finally the samplings are conducted according to that component.

September 24, 2013 8:49 WSPC/118-IJUFKS

662

S0218488513500311

T. Burger & S. Destercke

Fig. 5. (Left) Probability density functions of beta laws with various parameters πi and π0 − πi : (0.1,0.2), (0.7,1.2), (2,4), (2,7), (4,7), (0.3,1.4), (7,8). (Right) Distribution of the mixture model described in Example 2.

Example 2. We aim at modeling a sensor that provides the color of items of interest which can have 3 possible colors. We will model the frame of discernment as Ω = {T C, F C1, F C2}, T C standing for true color, F C for false ones: the actual color does not matter, as we assume the sensor reliability does not depend on it. We want to model the following behavior: • 75% of the time, the sensor works well, so that there are great chances that it provides the correct color, while being precise to some extent, so that we would like to sample a mass function with parameter π 1 = {2, 0.01, 0.02, 1, 0.4, 0.05, 0.8} in Algorithm 5. The πi corresponding to focal elements A = {T C} (π11 = 2), A = {T C, F C1} (π11 = 1), and A = Ω (π11 = 0.8) including the true color are higher (Note that we sample on P(Ω) \ ∅, to have normalized masses); • 25 % of the times, the sensor fails and its information is rather precise but not especially centered around the true color. This can be modeled by sampling a mass function with parameter π 2 = {1, 1, 1, 0.05, 0.05, 0.05, 0.001}. In such a case, one should use Algorithm 5 with a slightly modiﬁed randomlySample function, which allows sampling according to a mixture model with parameters 1 (π , 0.75), (π 2 , 0.25) . Finally, the distribution of focal element T C is a mixture with proportion (0.75, 0.25) of two beta distributions of parameters (2, 4.28) and (1,3.151), which is depicted in Fig. 5 (Right). Obviously, the 7 other focal elements have diﬀerent distributions, which are not depicted here. Finally, the power of expression of Dirichlet models is very high and we believe that, when applied to random generation of mass functions, its ﬂexibility is great enough for most of the situations.

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

663

5. Algorithms for Particular Types of Mass Functions So far, we have only considered the case where mass functions are sampled from C2|Ω| (or C2|Ω| −1 for normalized masses, such as in Example 2). However, in practice, there are several situations where one will be interested in simulating mass functions living in a subregion S of C2|Ω| . We investigate this problem in this section. Obviously, S could be of any arbitrary shape within C2|Ω| , leading to situations where the design of a sampling algorithm is really diﬃcult (and not necessarily useful). Here, we mainly focus on the following cases (which cover most commonly encountered situations): (1) S is a simplex; (2) S is a set of isomorphic simplices; (3) S is a set of simplices of diﬀerent dimensionality. The reason is that the simplex C2|Ω| over which we are sampling and the support F are strongly linked. In fact, each focal element of the support corresponds to a dimension of the simplex, and reducing the support to a subset of P(Ω) roughly comes down to reduce C2|Ω| to a region S likely to have a simplicial structure. Indeed, we already intuitively did so when passing from general masses to normalized masses, as we replaced C2|Ω| by C2|Ω| −1 . On the other hand, if we consider constraints on the mass assignments, for example ∃F1 , F2 ∈ F such that 0.24 ≤ m(F1 ) + m(F2 )2 ≤ 0.38, S may not have a simplicial structure any more. We will give a few words about such cases at the end of the section. 5.1. Case 1: S is a simplex The simplest case is when S is a sub-simplex of C2|Ω| . This case encompasses mass functions with a reduced number of focal elements which are not constrained or related to each others, that is the set Fm of focal elements is identical for all m ∈ S. Also, since S is a convex set, this means that for any m1 , m2 in S, (α)m1 +(1−α)m2 with α ∈]0, 1[ is also in S. Among masses mentioned in Sec. 1.2, this includes kadditive (among which are Bayesian and dogmatic ones), k-intolerant and linearvacuous mass functions (all can be required to be normalized or not). Vacuous mass functions are also in this case, but sampling them makes no sense. Sampling then comes down to just select those dimensions (i.e., focal elements) that must be kept, discard the others and sample as in Algorithm 5: this leads to Algorithm 6, where there are few changes with respect to Algorithm 5. There is a simple additional instruction at Line 2, calling the function generateSupportOnCondition, the purpose of which is to scan all the elements E of P(Ω) and add them in the support as long as they ﬁt a condition c. Example 3. Assume Ω = {ω1 , ω2 , ω3 , ω4 } and we want to sample the set of 2additive normalized mass functions. The condition c reads “E is a focal element iﬀ |E| ≤ 2”. Thus, generateSupportOnCondition provides the following support: F = {{ω1 }, {ω2 }, {ω3 }, {ω4 }, {ω1 , ω2 }, {ω1 , ω3 }, {ω1 , ω4 }, {ω2 , ω3 }, {ω2 , ω4 }, {ω3 , ω4 }}

September 24, 2013 8:49 WSPC/118-IJUFKS

664

S0218488513500311

T. Burger & S. Destercke

Algorithm 6: Algorithm to sample in a subsimplex of C2|Ω| .

1 2 3 4 5 6

Input: frame Ω, condition c, vector of parameters of Dirichlet law π Output: mass function m P ← generatePowerSetof(Ω); F ← generateSupportOnCondition(P, c); N ← sizeOf(F ); foreach i ≤ N do mi ← randomlySample(Gamma(πi, 1)); m|N | m2 1 , F , . . . , F ; m← F1 , mm , , 2 |N | mk mk k k

k

k

Now, assuming further that we want rather imprecise 2-additive mass functions, we can pick π = (0.5, 0.5, 0.5, 0.5, 1, 1, 1, 1, 1, 1), which expresses the fact that the expected value of binary set masses will be twice the one of singletons. 5.2. Case 2: S is a set of isomorphic simplices We now deal with a bit more complex case that still encompass many practical cases. In particular, it encompasses (1) mass functions with only N < |Ω| focal elements, (2) consonant and (3) consistent mass functions. In this section, we ﬁrst devote some attention to these three situations, as they cannot be dealt with the previous algorithms, before dealing with the general case where S is a set of isomorphic simplices. 5.2.1. General mass functions with fixed size of support Let us ﬁrst consider the case where the number of focal elements is a ﬁxed number N , but where the support Fm can change. Note that if m1 and m2 do not have the same N focal elements, then their convex combination (α)(m1 ) + (1 − α)m2 with α ∈]0, 1[ will have more than N focal elements (take, for example, two categorical mass functions bearing on two distinct sets). This means that a convex combination of two elements of S may not be in S, hence S is not a simplex. Actually, as ﬁxing a collection of N focal elements deﬁnes a (N − 1)-standard simplex, the set of mass functions with a support of cardinality N is a set of simplices of identical dimensionalities, and it is possible to transform any of these simplices into any of the others (by permuting focal elements). Finally, sampling N focal elements so that they describe a single simplex among those of S, and then sampling a vector from a single simplex of cardinality N − 1, is equivalent to sampling the collection of simplices. It leads to Algorithm 7, which is not that diﬀerent from Algorithm 5: we simply add an instruction to control the size of the support. To this extend, we use the function sampleWithoutReplacement which simply randomly sample N elements out of P(Ω). In this algorithm, we

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

665

only consider exponential sampling instead of gamma sampling, leading to a uniform distribution over the simplex: as the dimensions of the simplex are ordered randomly during the selection of the simplex of interest (among the set of isomorphic simplices), promoting one of the focal element with respect to the others is meaningless. Algorithm 7: Algorithm to sample from a set of isomorphic simplices. Input: frame Ω, number of focal elements N Output: mass function m 1 P ← generatePowerSetof(Ω); 2 F ← sampleWithoutReplacement(P, N ); 3 foreach i ≤ N do 4 mi ← randomlySample(Exp(1)); m 1 m2 |N | , F , . . . , F ; 5 m← F1 , mm , , 2 |N | m m k k k k

k

k

5.2.2. Consonant mass functions with maximal support Since they play, along with consistent mass functions, important roles in practical application, let us consider the set of consonant mass functions with the maximal number of focal elements, i.e. |Fm | = |Ω|. There is a one-to-one correspondence between permutations over elements of Ω and the set of simplices corresponding to such consonant mass functions (each permutation uniquely deﬁning a maximal chain of focal elements according to inclusion). Algorithm 8 uses this relation to sample focal elements of a given consonant mass, and then aﬀects masses to these focal elements. Algorithm 8: Algorithm to sample consonant mass functions. Input: frame Ω, vector of parameters of Dirichlet law π Output: mass function m 1 O ← randomlySort(Ω); 2 foreach i ≤ |Ω| do 3 Fi ← {O1 , . . . , Oi }; 4 mi ← randomlySample(Gamma(πi, 1)); m 1 2 , F2 , mm , . . . , F|N | , |Nm|k ; 5 m← F1 , mm k k k

k

k

Remark 1. Simulating a Bayesian mass and then transforming it into a consonant one using some transforms28 still provides a random sampling method, however it would come with no guarantee about the statistical behavior of the obtained

September 24, 2013 8:49 WSPC/118-IJUFKS

666

S0218488513500311

T. Burger & S. Destercke

distribution (this is obvious when we take into account that there are multiple possible transforms from Bayesian to consonant masses). Remark 2. It is possible to rewrite Algorithm 8 so that it ﬁts the structure of one of the previous algorithms (for instance Algorithm 6 or 7), but with a dedicated function generateConsonantSupport() to build the support at Line 2 (this function is described in Algorithm 9). As a consequence, it appears that Algorithm 6, 7 and 8 have the same structure: all of them are based on a particular deﬁnition of the support, followed by a classical sampling on a simplex. Therefore, it is straightforward to mix them, i.e. to derive algorithms to deal with consonant mass functions which are forced to be k-additive, k-intolerant, etc., and/or to have a restricted the number N of focal elements. As a corollary, we are now capable of simulating consonant mass functions with a support of ﬁxed cardinality and not only of maximal cardinality.

Algorithm 9: Algorithm of function generateConsonantSupport().

1 2 3 4 5

Input: frame Ω Output: a support F O ← randomlySort(Ω); foreach i ≤ |Ω| do Fi ← {O1 , . . . , Oi }; F ← F1 , F2 , . . . , F|Ω| ; return F ;

5.2.3. Consistent mass functions with maximal support Let us now consider the case of consistent mass functions with maximal support, i.e. where the intersection of all the elements of the support is exactly a singleton (such mass functions lives in a set of |Ω| diﬀerent simplices). The idea is to randomly select a singleton ω of Ω and to randomly sample a mass function on the frame Ω\ω, and ﬁnally, to add ω to all the focal elements. This is displayed in Algorithm 10. Remark 3. Remark 2 is also valid for consistent mass functions: It is possible to rewrite Algorithm 10 so that its entire speciﬁcity lies in the deﬁnition of the support (dealt with in a separate function generateConsistantSupport()). As a consequence, it is possible to mix a condition on the cardinality of the focal elements, a constraints on the number of focal elements in the deﬁnition of a consistent support. It is also possible to decide the size of the intersection of the focal elements: in Algorithm 10, this intersection is ω, but it is possible to select at random in Line 1 more than a single singleton, and to modify the algorithm accordingly.

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

667

Algorithm 10: Algorithm to sample consistent mass functions. Input: frame Ω, vector of parameters of Dirichlet law π Output: mass function m 1 ω ← sampleSingleton(Ω); 2 F ← generatePowerSetof(Ω \ ω); 3 N ← sizeOf(F ); 4 foreach i ≤ N do 5 mi ← randomlySample(Gamma(πi, 1)); m 1 m2 |N | , {F , . . . , {F ; 6 m← {F1 , ω}, mm , ω}, , ω}, 2 |N | mk mk k k

k

k

5.2.4. Sets of isomorphic simplices: the general case There is a last case which is apparently a bit more complicated, but, in practice, reduces to the same setting, while generalizing all the previous ones: let us consider, for instance, the set of simple support mass functions (which have only two focal elements, one being {Ω}). It is a mix of two of the previous cases, as one wants to control both the cardinality of the focal elements (involving a condition c such as in Algorithm 6) and their number, such as achieved in Algorithm 7. Although less classical, it might be interesting to sample mass functions which require such an algorithm. Beyond the case of simple support functions (which can be sampled in an obvious way), other sets corresponding to such a situation include: • the set of mass functions with one and only one single focal element of each cardinality; • the set of k-additive mass functions with exactly focal elements; • etc. In fact, in Algorithm 6, the instruction calling generateSupportOnCondition partitions the power set into two sets : (1) the focal elements, (2) the elements of the power set which are not focal elements. Conversely, in Algorithm 7, the partition is naive, as all the elements of the power sets are potential focal elements, however, only a handful of them are randomly selected. It is possible to mix the two algorithms by, (1) partitioning the power set according to condition c (i.e. extract a support such as with generateSupportOnCondition) (2) selecting N focal elements among those enforcing the condition c. This would help sampling for instance a 4-additive mass function with 5 focal elements. It is possible to go even a step further, by partitioning at the ﬁrst stage the power set into more than 2 sets, each corresponding to a dedicated condition ci , and then to sample Ni focal elements respecting condition ci , ∀i. This can be done very simply: If • part is a partition of [0, . . . , Ω], (we denote by part(i) the ith element of part); • f is a function from part onto N∗ that maps to part(i) the number of elements of P(Ω) with cardinality ∈ part(i);

September 24, 2013 8:49 WSPC/118-IJUFKS

668

S0218488513500311

T. Burger & S. Destercke

• g is a function from part onto [0, 2|Ω| ] so that g ≤ f , and

i

g(part(i)) ≤ 2|Ω| ;

then, any mass function with g(part(i)) focal elements of cardinality ∈ part(i) can be dealt with Algorithm 11. With respect to Algorithm 5, there is little change. From Line 2 to Line 6, we simply use a more reﬁned way to deﬁne the support. Afterward, the algorithm remains unchanged. In the function sampleWithoutReplacement we simply assume that if g(i) > Ppart(i), all the elements of Ppart(i) are taken. Here again, the simplex of interest being randomly selected only uniform sampling makes sense. Example 4. Let us assume there are 5 elements in the frame. • In order to sample a mass function with exactly one focal element of each cardinality, we need to use Algorithm 11, with part = {[0], [1], [2], [3], [4], [5]}, and g = {0, 1, 1, 1, 1, 1}. • In order to sample a simple support mass function, the parameters are part = {[0], [1, 2, 3, 4], [5]}, and g = {0, 1, 1}. • Finally, for a 2-additive mass function with 3 focal elements, the parameters are part = {[0], [1, 2], [3, 4, 5]}, and g = {0, 3, 0}.

Algorithm 11: General algorithm encompassing Algorithms 6 and 7. Input: frame Ω, partition part, list of numbers of focal elements g Output: mass function m 1 P ← generatePowerSetof(Ω); 2 Ppart ← partitionPowerSet(P, part); 3 F ←∅ ; 4 n ← sizeOf(part); 5 foreach i ≤ n do 6 F ← F ∪ sampleWithoutReplacement(Ppart(i), g(i)); 7 8 9 10

N ← sizeOf(F ); foreach i ≤ N do mi ← randomlySample(Exp(1)); m 1 m2 |N | , F , . . . , F ; m← F1 , mm , , 2 |N | mk mk k k

k

k

Naturally, it is also possible to add a global constraint, such as the consonancy or the consistency (if for instance, we aim at sampling the set of consonant mass functions with exactly no focal elements of cardinality 3 and |Ω|). This can easily be achieved by replacing generatePowerSetof() at Line 1, by a function such as that of Algorithm 9).

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

669

5.3. Case 3: S is a set of simplices of diﬀerent dimensionality So far, whatever the case (Case 1: S is a simplex, and Case 2: S is a set of isomorphic simplices), the number of focal elements is either ﬁxed (N or |Ω|) or can be a priori k i calculated (e.g., the number of focal elements of cardinality ≤ k is i=1 |Ω| /i!). It sounds natural, as if one does not know the number of focal elements, it is impossible to determine the dimensionality of the simplex (or the set of simplices) to sample. On the other hand, we might be interested in sampling a mass function, the number of focal elements of which is not a priori known. In practice, it means sampling a set of simplices with diﬀerent dimensionalities: naturally, before the sampling, one does not know which of the simplices will be sampled, and its dimensionality cannot be forecast. For example, one may be interested in sampling the set of all the consonant mass functions, whatever their number of focal elements. To do so, we propose two distinct methods. 5.3.1. Using Dirichlet distribution The fact is that when sampling a set of mass functions with support of diﬀerent cardinalities, the probability of generating a mass the support of which is not of maximal size is null. The reason is the following: the Lebesgue measure of a subsimplex of cardinality k is null when embedded in a vector space of cardinality > k. Therefore, when considering the set of all consonant mass functions, the only way to sample a mass with strictly less than |Ω| focal elements is to sample one or more zeros by chance in the gamma sampling, an event that has probability zero. Hence, in practice, sampling the set of consonant mass functions is equivalent to sampling the set of consonant mass functions with a maximal number of focal elements (i.e. |Ω|), this latter set being a collection of simplices of identical dimensions. This is why, in practice, Algorithms 8 and 10 can be used to sample the whole set of consonant / consistent mass functions whatever the support. The user should nevertheless be aware that mass functions with support less than |Ω| focal elements, although theoretically possible, will never be sampled (as they have a nil probability). At most, some very small mass assignments are manageable by tuning the parameter π accordingly. This remark gives a particularly important roles to the previous algorithms: beyond the cases of consonant / consistent mass functions, any algorithm dealing with a ﬁxed (and maximal) size of support is also adapted to sample any of its sub-simplices. 5.3.2. Using the Rejection method Nonetheless, it sometimes makes sense to assume that the size of the support may vary across samplings: for example, when simulating diﬀerent experts that will provide diﬀerent focal elements with more or less precise opinions. Another example is when testing a system, as in this case we are interested in simulating a wide range of diﬀerent types of mass functions with diﬀerent numbers of focal elements. To do

September 24, 2013 8:49 WSPC/118-IJUFKS

670

S0218488513500311

T. Burger & S. Destercke

so, we may want to generate mass functions with various size of support, regardless to their probability of observation in a uniform sampling. In this case, it becomes necessary to add a categorical distribution over the sets of possible supports or simplices from which we want to sample masses. In the case where the sampling over the set of supports is uniform, this can be implemented in a very generic algorithm using the rejection method, usable for consonant, consistent or various other types of mass functions. The idea of Algorithm 12 is the following: by generating the power set of the powerset, we have a list of all the possible supports. Then in the while loop, we keep on sampling supports until one ﬁts a particular condition c. The rejection method being really time-consuming, if it is possible to use or simply adapt one of the previous algorithms instead, we advise to do so (as a matter of fact, all the algorithms presented in this article, apart from this last one involving the Rejection method, have the same computational complexity, as they involve in the worst case 2N gamma sampling). However, the interest of Algorithm 12 is its genericity: the condition c may encode any type of constraint, such that being both k-additive and consonant, or being both consistent and with less than N focal elements, etc. This algorithm, by sampling in separated but uniform ways the support and the mass assignments is very useful to generate particular types of mass functions. However, the user of this algorithm must be aware that the sampling distribution D over MΩ induced by the algorithm is not straightforward, and not necessarily uniform over the set of simplices involved. Thus, although very useful, its statistical properties are not controlled anymore, such as in the algorithms we presented in the state-of-the art, and it should be dealt with accordingly. Algorithm 12: Algorithm to uniformly sample simplex uniformly sampled from the set of possible simplex, up to a condition. Input: frame Ω, condition c Output: mass function m 1 P ← generatePowerSetof(Ω); 2 PP ← generatePowerSetof(P); 3 ﬂag ← 0; 4 while flag==0 do 5 F ← sampleWithoutReplacement(PP, 1); 6 f lag ← c(F ); 7 8 9 10

N ← sizeOf(F ); foreach i ≤ N do mi ← randomlySample(Exp(1)); m|N | m2 1 m← {F1 , ω}, mm , ω}, , ω}, , {F , . . . , {F ; 2 |N | mk mk k k

k

k

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

671

5.4. S is not a simplex In the previous sections, we provided diﬀerent algorithms to sample (uniformly or not) mass functions whose support Fm had diﬀerent features and deﬁned various collection of simplices. However, once Fm was sampled or deﬁned, no constraints were imposed on the possible mass m(E) of a focal element E, that is the algorithm could sample any value m(E) ∈ [0, 1]. In the more general case, S is not necessarily a simplex. In such a case, it is not possible to provide a general algorithm, apart from the rejection method, the purpose of which is to repeatedly sample C2|Ω| −1 according to some law D, and to reject the result as long as it does not belong to S. Finally, the set of sampled mass functions follows D on S. The only point is that one has to make sure that S has a non-null Lebesgue measure in C2|Ω| −1 otherwise the rejection may occur with probability one (and the probably of algorithm ending is zero). More generally, this method is eﬃcient when the probability of rejection is not too great. To avoid inﬁnitely long rejections, the sampling should be considered with the smallest simplex which embeds S, and the selection of this simplex should be done according to our previous recommendations. This last case will typically concerns situations where some numerical (hard) constraints are imposed on masses. These constraints are general and may be directly given, e.g., a mass function whose plausibility of an event A has a minimal value, or induced by the context, e.g., a mass function not too “distant”8 from a reference mass function m∗ . Also, as we mentioned, such constraints may be combined with constraints on the support Fm , and in this case such restrictions should be applied. The next example illustrates such a situation. Example 5. Assume we have Ω = {ω1 , ω2 , ω3 } and we want to test the behavior of some functions f that makes sense only for consonant functions and that computes a “gain” of information between two consonant functions m∗ and m that have the same sets of focal elements, one of which (m) is more speciﬁc than the other (m∗ ). In the consonant case, m being more speciﬁc than m∗ means that P lm (ωi ) ≤ P lm∗ (ωi ) for i = 1, 2, 3. Assuming that m∗ is some reference (normalized) mass function with Fm∗ = {{ω1 }, {ω1 , ω2 }, {ω1 , ω2 , ω3 }}, we should sample m from the simplex deﬁned by Fm = {{ω1 }, {ω1 , ω2 }, {ω1 , ω2 , ω3 }} (using Algorithm 8 with a ﬁxed ordering) and reject the sample each time inequalities P lm (ωi ) ≤ P lm∗ (ωi ) for i = 1, 2, 3 are not satisﬁed. 6. Conclusion In this article we reviewed the classically used algorithms to randomly generate mass functions, and we explain why their statistical properties are hard to characterize. Then, we recall some basic mathematical background on Dirichlet distribution, and we explain why it provides a suitable framework for the random generation of mass

September 24, 2013 8:49 WSPC/118-IJUFKS

672

S0218488513500311

T. Burger & S. Destercke

functions. Finally, we present various algorithms adapted to the most classical types of mass functions, so that anyone facing a simulation problem in the framework of belief function theory may ﬁnd an algorithm adapted to their problem. References 1. N. Wilson, Algorithms for Dempster–Shafer theory, in Algorithms for Uncertainty and Defeasible Reasoning (Kluwer Academic Publishers, 2000), pp. 421–475. 2. N. Wilson and Q. Mary, A Monte-Carlo Algorithm for Dempster–Shafer Belief (Queen Mary and Westﬁeld College, 1991). 3. V. Kreinovich, A. Bernat, W. Borrett, Y. Mariscal and E. Villa, Advances in the Dempster–Shafer Theory of Evidence (John Wiley & Sons, Inc., New York, USA, 1994). 4. S. Moral and A. Salmeron, A Monte-Carlo algorithm for combining Dempster–Shafer belief based on approximate pre-computation, Symbolic and Quantitative Approaches to Reasoning and Uncertainty (1999) 305–315. 5. I. Kramosil, Monte-Carlo estimates for belief functions, Tatra Mt. Math. Publ. (1999) 3339–357. 6. D. A. Alvarez, Reduction of uncertainty using sensitivity analysis methods for inﬁnite random sets of indexable type, Int. J. Approximate Reasoning 50 (2009) 750 – 762. 7. G. J. Klir, Uncertainty and Information (John Wiley & Sons, Inc., Hoboken, NJ, USA, 2005). 8. A.-L. Jousselme and P. Maupin, Distances in evidence theory: Comprehensive survey and generalizations, Int. J. Approximate Reasoning 53 (2012) 118–145. 9. S. Destercke and T. Burger, Toward an axiomatic deﬁnition of conﬂict between belief functions, IEEE Trans. Systems, Man and Cybernetics, Part B (2012) to appear. 10. N. B. Abdallah, N. M. Voyneau and T. Denoeux, Combining statistical and expert evidence within the Dempster–Shafer framework: Application to hydrological return level estimation, in Belief Functions (2012), pp. 393–400. 11. Y. Bi, J. Guan and D. A. Bell, The combination of multiple classiﬁers using an evidential reasoning approach, Artif. Intell. 172 (2008) 1731–1751. 12. G. Shafer and P. P. Shenoy, Local computation in hypertrees, Technical report, Working paper No. 201, School of Business, University of Kansas, 1991. 13. J. Dezert, D. Han, Z. Liu and J.-M. Tacnet, Hierarchical proportional redistribution for bba approximation, in Belief Functions: Theory and Applications — Proc. 2nd Int. Conf. Belief Functions (2012), pp. 275–283. 14. G. Shafer, A Mathematical Theory of Evidence (Princeton University Press, New Jersey, 1976). 15. P. Smets and R. Kennes, The transferable belief model, Artificial Intelligence 66 (1994) 191–234. 16. J. Kohlas and P. Monney, A mathematical theory of hints: an approach to the Dempster–Shafer theory of evidence, Lecture Notes in Economics and Mathematical Systems (Springer, 1995). 17. M. Grabisch, k-order additive discrete fuzzy measures and their representation, Fuzzy Sets and Systems 92 (1997) 167 – 189. 18. J. A. Barnett, Computational methods for a mathematical theory of evidence, in Proc. 7th Int. Joint Conf. Artificial intelligence, Vol. 2 (Morgan Kaufmann Publishers Inc., 1981), pp. 868–875. 19. S. Destercke, D. Dubois and E. Chojnacki, Unifying practical uncertainty representations, II: Clouds, Int. J. Approx. Reasoning 49 (2008) 664–677.

September 24, 2013 8:49 WSPC/118-IJUFKS

S0218488513500311

How to Randomly Generate Mass Functions

673

20. T. Burger and S. Destercke, Random generation of mass functions: A short howto, in Belief Functions: Theory and Applications — Proc. 2nd Int. Conf. Belief Functions (2012), pp. 145–152. 21. A.-L. Jousselme and P. Maupin, On some properties of distances in evidence theory, in Proc. Workshop on Theory of Belief Functions (2010). 22. B. Tessem et al., Approximations for eﬃcient computation in the theory of evidence, Artificial Intelligence 61 (1993) 315–329. 23. M. Bauer, Approximation algorithms and decision making in the Dempster–Shafer theory of evidence — an empirical study, Int. J. Approximate Reasoning 17 (1997) 217–237. 24. N. L. Johnson and S. Kotz, Continuous Distributions (John Wiley, New York, USA, 1970). 25. F. Cuzzolin, A geometric approach to the theory of evidence, IEEE Trans. Systems, Man and Cybernetics, Part C: Applications and Reviews 38 (2008) 522–534. 26. A. Jøsang and Z. Elouedi, Interpreting belief functions as dirichlet distributions, Symbolic and Quantitative Approaches to Reasoning with Uncertainty (2007) 393– 404. 27. E. Quaeghebeur, Characterizing the set of coherent lower previsions with a ﬁnite number of constraints or vertices, in UAI (2010), pp. 466–473. 28. D. Dubois, H. Prade and P. Smets, A deﬁnition of subjective possibility, Int. J. Approximate Reasoning 48 (2008) 352–364.

September 24, 2013 8:49 WSPC/118-IJUFKS ...

version of the decimal number j. Moreover, as .... Dirichlet distribution and the very classical gamma distribution. The gamma distribution Gamma(Î±, Î²) with shape parameter Î± â Râ. + and scale parameter Î² â Râ. + has the ... Then, by generating independent values with gamma distributions of same scale parameter, it is ...

Download PDF

2MB Sizes 4 Downloads 184 Views

Report

September 24, 2013 8:49 WSPC/118-IJUFKS ...

Recommend Documents