Constructing Dynamic Frames of Discernment in Cases of Large ...

Viewer
Transcript

Constructing Dynamic Frames of Discernment in Cases of Large Number of Classes Yousri Kessentini1 , Thomas Burger2, and Thierry Paquet1 1

Universit´e de Rouen, Laboratoire LITIS EA 4108, site du Madrillet, St Etienne du Rouvray, France {yousri.kessentini,thierry.paquet}@univ-rouen.fr 2 Universit´e Europ´eenne de Bretagne, Universit´e de Bretagne-Sud, CNRS, Lab-STICC, F-56017 Vannes cedex, France [email protected]

Abstract. The Dempster-Shafer theory (DST) is particularly interesting to deal with imprecise information. However, it is known for its high computational cost, as dealing with a frame of discernment Ω involves the manipulation of up to 2|Ω| elements. Hence, classiﬁcation problems where the number of classes is too large cannot be considered. In this paper, we propose to take advantage of a context of ensemble classiﬁcation to construct a frame of discernment where only a subset of classes is considered. We apply this method to script recognition problems, which by nature involve a tremendous number of classes. Keywords: Dempster-Shafer theory, Dynamic frames of discernment, Data fusion.

1

Introduction

The Dempster-Shafer theory (DST) [1, 2] is a particularly interesting theory to deal with imprecise, conﬂictive or partial sources of information. The counterpart of this eﬃciency is its high computational complexity. One of the main reasons for this complexity is related to the state-space (or frame of discernment): When the actual value ω0 taken by a variable W is only known to belong to a set Ω = {ω1 , . . . , ω|Ω| } of possible values, the distributions encoding some knowledge on W are deﬁned over 2|Ω| focal elements (the element of the power set of Ω, noted P(Ω)), leading to an exponential number of elements to deal with. To balance that, several methods exist. The most natural idea is to try to reduce the number of such focal elements, by forcing some mass assignments to 0, so that the remaining ones display a particular structure which is supposed to be relevant with respect to the knowledge encoded (for instance, Bayesian [3], consonant [4], and k-additive mass functions [5]). Another natural idea is to reduce the size of Ω with various low-cost processings, so that the reﬁned modeling of DST is left only to the most interesting possible values for W. In [6], mass functions are deﬁned directly on Ω rather than P(Ω), but it is only possible if Ω is ﬁtted with a partially ordered structure. Finally, in [7], W. Liu (Ed.): ECSQARU 2011, LNAI 6717, pp. 275–286, 2011. c Springer-Verlag Berlin Heidelberg 2011

276

Y. Kessentini, T. Burger, and T. Paquet

the authors propose to consider coarsened frames, to reduce the computational cost of the following Dempster’s rule. On the contrary, many works consider a problem dual of ours, i.e. constructing an exhaustive frame thanks to multiple evidences based on partial frames, such as in [8] or [9]. In this paper, we consider classiﬁcation problems (i.e. the variable W is a class variable) where the number of classes involved (i.e. |Ω|) is very large, such as in handwriting word recognition, where a dictionary may contain up to 100, 000 words. To face the corresponding computational issue, we propose to reduce the size of Ω. To do so, we propose to take advantage of a context of ensemble classiﬁcation to construct a frame of discernment where only a subset of classes is considered. These classes are dynamically selected according to the diversity of the classiﬁers involved. We propose and compare diﬀerent strategies to build such a dynamic frame. We show that the proposed strategies considerably reduce the complexity of a DST approach to ensemble classiﬁcation, while providing a statistically signiﬁcant improvement of the classiﬁcation performances with respect to classical probabilistic combination methods. The paper is structured as follows: In Section 2, we recall the basis of handwriting recognition, and we recall some results on ensemble classiﬁcation in the context of DST. In Section 3, we present four diﬀerent strategies to deﬁne dynamic frames of discernment. Finally, we compare them in Section 4 on Latin and Arabic handwriting datasets, and we discuss the results.

2 2.1

Handwriting Word Recognition Background

One of the most popular technique for automatic handwriting recognition is to use generative classiﬁers based on Hidden Markov Models (or HMM) [10]. For each word ωi of a lexicon Ωlex = {ω1 , ..., ωV } of V words, a HMM λi , i ≤ V is deﬁned, so that λi best ﬁts a training set made of several diﬀerent instances of words (these instances are called example words). Practically, this training phase is conducted by using the Viterbi EM or the Baum-Welch algorithm [10]. Then, when a new unknown word ω is considered (a test word from a testing set ), the likelihoods P(ω|λi ), ∀i ≤ V are approximated by the likelihoods provided by the Viterbi decoding algorithm (noted L(ωi ), ∀i), and ω is recognized as ωj for which L(ωj ) ≥ L(ωi ), ∀i ≤ V . Generally, in the evaluation step, the classiﬁer does not provide only the “best” class, but an ordered list of the TOP N best classes. Then, for each value of n ≤ N , a recognition rate can be computed as the percentage of words for which the ground truth class is proposed in the ﬁrst n elements of the TOP N list. This complete set-up is called an HMM classiﬁer. In order to improve recognition accuracy, it is classical to deﬁne several HMM classiﬁers, each working on diﬀerent features (then, the likelihood of the q-th classiﬁer for ωi is noted Lq (ωi )), and to combine them [11,12,13,14]. It has been established in [15], that using a set of three classiﬁers working respectively on the upper contour of the

Dynamic Frames of Discernment

277

pen mark, the lower contour, and the ink density, provides accurate results both on Latin and Arabic datasets. There are several ways to combine these classiﬁers. The most classical way is to consider the product of the Q likelihoods for each class. It corresponds to the assumptions that the features used by the classiﬁers are independent, and that the product of the likelihoods1 is the likelihood of the resulting ensemble classiﬁcation. In the sequel, we refer to this method as the reference probabilistic method (RPM). On the other hand, we have demonstrated in [16,17] the superiority of several evidential combination methods to several classical probabilistic combination strategies, including the RPM (which appears to be the best non evidential method). 2.2

DST Combination of HMM Classifiers

We assume that the reader is familiar with the basic elements of the DempsterShafer theory. Unfamiliar readers should refer to [1, 2], where the following notions are presented: power set, mass function, focal element, vacuous/categoric/ consonant/Bayesian mass functions, conjunctive (or Dempster’s rule of-) combination, pignistic transform and discounting. The combination of several probabilistic classiﬁers in the DST is a widely studied topic [11, 18, 19, 20, 21, 22, 23, 24, 25], that we have already reviewed in previous works of ours [17, 26]. Here, to combine the results of several HMM classiﬁers, we use the following procedure inspired from previous works of ours [17]: First, for each of the Q classiﬁers, we normalize the likelihoods so that they sum up to one over the whole set of classes. Second, a mass function is derived from each of the Q classiﬁers. Third, the accuracy rates of the classiﬁers (derived from a crossvalidation procedure) are used to weight each mass function according to the reliability of each classiﬁer. Fourth, the Q mass functions are combined together. Finally, a probabilistic transform is applied, and the so-derived probability values are sorted decreasingly to provide the TOP N list. Concerning the ﬁrst and second steps, several methods may be used. We have compared several of them in [16, 17], and ﬁnally, we consider the use of a sigmoid function for the normalization, and the use of the inverse pignistic transform [4] for the conversion onto a mass function. The inverse pignistic transform converts an initial probability distribution p into a consonant mass assignment. The resulting consonant mass assignment, denoted by p, is built as follows: The elements of Ω are ranked by decreasing probabilities such that p(ω1 ) ≥ . . . ≥ p(ω|Ω| ), and we have p

ω1 , ω2 , . . . , ω|Ω| = p (Ω) = |Ω| × p(ω|Ω| )

(1)

p ({ω1 , ω2 , . . . , ωi }) = i × [p(ωi ) − p(ωi+1 )] ∀ i < |Ω| p (.) = 0 otherwise. 1

These likelihoods are possibly weighted by the TOP 1 accuracy rate of each classiﬁer, if the information is available after a cross-validation procedure.

278

Y. Kessentini, T. Burger, and T. Paquet

The reason for this choice is manifold: First, it corresponds to the best trade-oﬀ between computational complexity and performances. Second, it has no parameter to tune. Third, it provides a consonant mass function, which is interesting for computational as well as epistemological reasons. As a matter of fact, the result of a classiﬁer is an ordered list, the natural representation of which in the DST is a consonant mass function [26]. Then, the probability distribution provided by each classiﬁer can be seen as the pignistic transform of a particular consonant mass function, that is recovered via the inverse pignistic transform. Concerning the third step, it is either possible to use all the TOP N accuracy rates ∀N ≤ |Ωlex |, in a manner similar to that of [17] (which generalizes the method of [11]), or to simply use the TOP 1 accuracy rates, by the application of a classical discounting. In spite of involving less information, we have chosen the second option. The reason is that exactly the same information (only the TOP 1 accuracy rates) can be used to weight the classiﬁers in the RPM (by multiplying each probability value given by a classiﬁer, by its TOP 1 accuracy rate). On the other hand, the method described in [17] (involving all the TOP N accuracy rates) has no counterpart in the RPM. Hence, by choosing the second option, we guarantee that the probabilistic and DST-based methods remain comparable. In the fourth step, we consider by default a conjunctive combination. The reasons for such a default choice are those which are detailed in [27]. It is also possible to perform diﬀerently, as detailed in [28], where diﬀerent combination are considered depending on the pairwise conﬂict among the classiﬁers. Despite its real interest from a performance point of view, the conditions required to make a choice among the diﬀerent combinations are not adapted to handwriting recognition problems, and the framework proposed in [27] better corresponds to our situation. Finally, the probability transform we use is the pignistic transform, which sounds natural, to remain coherent with respect to the processings of step 2. Hence, if . is is the pignistic transform, and if pq is the probability provided by the qth classiﬁer, then, we consider the following classiﬁcation procedure: ω∗ = arg max m(ωi ) i

2.3

Q ∩ q=1 [pq ] where m =

Computational Issues

In handwritten word recognition, the set of classes is of a very high size with respect to the cardinality of the state space in classical DST problems (up to 100, 000 words). When dealing with a lexicon set of V words, the mass functions involved are deﬁned on 2V values. Moreover, the conjunctive combination of two V mass functions involves up to 22 multiplications and 2V additions. Thus, the computational cost is exponential with respect to the size of the lexicon, and 100, 000 worlds are not directly tractable. To remain eﬃcient, even for large vocabularies, it is mandatory either to reduce the complexity, or to reduce the size of the lexicon involved. To do so, as noted in the previous section, consonant mass functions (with only V focal elements) may be considered. In addition, it is also possible to reduce the size of

Dynamic Frames of Discernment

279

the lexicon by eliminating all the word classes which are obviously not adapted to the test word under consideration. Hence, we consider only the few word classes among which a mistake is possible because of the diﬃculty of discrimination. Consequently, instead of working on Ωlex = {ω1 , ..., ωV }, we use another frame ΩW , deﬁned according to each particular test word W we aim at classifying. That is why we say that such a frame is dynamically deﬁned. This strategy is rather intuitive and simple. On the other hand, to our knowledge, no work has been published on a comparison of the diﬀerent strategies which can be used to deﬁne such frames. This is achieved in this paper. The next section presents several strategies that will be compared in the sequel.

3

Dynamic Frames of Discernment

In this section, we describe several strategies to derive a frame ΩW of reduced size from Ωlex , the latter being too large. Note that, depending on the test word W, the number of classes among which the discrimination remains diﬃcult may vary. Hence, naturally, the size of ΩW may vary accordingly. On the other hand, it is possible to force the frames to be of the same cardinality whatever the test word (by truncating or extending it with useless classes). Hence, there are two options: A ﬁx or a variable cardinality of the frames ΩW , ∀W. In this paper, we have chosen the second option. It does not improve the results, but it helps to have a more standard basis for the comparison of the diﬀerent strategies, as a poor strategy to select the classes to built ΩW will not be balanced by looser constraints on the acceptance/rejection of the classes. Let us consider Q classiﬁers. Each classiﬁer q provides an ordered list lq = q {ω1q , ..., ωN } of the TOP N best classes and their corresponding likelihoods noted q L(ωi ), ∀i < N . The diﬀerent strategies described below take as input the lists lq ∀q ≤ Q and construct a dynamic frame with a controlled size M ≤ N ≤ |Ωlex |. 3.1

Strategy 1: Intersection

Here, the frame ΩW is made of all the words which are common to the output lists lq , ∀q < Q. Obviously, |ΩW | depends on the lists: If the Q classiﬁers globally concur, their respective lists are similar and an important proportion of their N words are likely to be found in their intersection. On the contrary, if the Q classiﬁers mostly disagree, very few words belong to the intersection of the lists. Here, we expect |ΩW | to remain constant. Hence, the lists of all the Q classiﬁers are considered for increasing values of N , until the intersection of the lists is made of exactly M words (draws are randomly sorted). As the algorithm on which the classiﬁers are based requires having all the probability values of each of the Ωlex words, the lists with N = |Ωlex | are directly available and this iterative strategy has no inﬂuence from a computational point of view. Intuitively, the motivation for this strategy is to use the intersection scheme to reduce the number of potential classes for the test word. The idea behind it is that the conjunctive combination is also based on intersections of sets, and that, by discarding all the empty intersections before its computation, unnecessary computations are suppressed while keeping the important ones.

280

3.2

Y. Kessentini, T. Burger, and T. Paquet

Strategy 2: Union

This strategy is exactly similar to the previous one, but the frame of discernment is made of the union of lists rather than on the intersection. Contrary to previous strategie, if the Q classiﬁers globally concur, their respective lists are similar and very few words belong to the union of the lists. On the contrary, if the Q classiﬁers mostly disagree, an important proportion of their N words are likely to be found in their union. Hence, we adjust the value of N to control the size of the powerset, in practice a powerset size between 15 and 20 is used. The idea motivating this strategy is the following: If a single classiﬁer fails and provides too bad a rank to the real class, the other classiﬁers will not balance the mistake when considering the intersection strategy. Then, the union may be preferable. 3.3

Strategy 3: Borda Count

A major problem with the two previous strategies, is that, the rank in each list is not involved in the creation of ΩW . Hence, we propose to use a Borda Count procedure: Each class receives a number of votes corresponding to the sum of its ranks in the Q lists. Then, the M word classes with the smallest number of votes are selected to compose ΩW . From a computational cost point of view, this preprocessing is rather light, as it involves Q × |Ωlex | additions and a call to a sorting function. In practice a powerset size of 15 classes is used. 3.4

Strategy 4: Probabilistic Pre-processing

In spite of its lack of accuracy, the RPM is interesting: First, it is rather cheap from a computational point of view. Second, whatever the value δ, it is possible to achieve an accuracy of 100 − δ% at TOP N if N is great enough. Trivially, if N = |Ωlex |, then, the TOP N accuracy rates is 100%, but most of the time, an accuracy rate of 100% is achieved for smaller values of N . Then, the idea is simply to select for ΩW the M best classes according to the RPM, (among which the real class is likely to belong to), i.e. the M classes for which the product of the likelihood is the greatest, and to use the DST ensemble classiﬁcation as a tool to reﬁne the decision. Thus, the RPM is used to discard all the classes but M (even if it corresponds to a great number of classes, this is the simple step involving less computation), and afterward, the DST based method is used to discard the remaining M − 1 classes (this discrimination being more complex, more computational resources are allowed to it), so that a single class (hopefully, the right one) remains. In practice a powerset size of 15 classes is used. The four proposed strategies built a dynamic powerset with a ﬁxed size in order to reduce the computational cost. In practice, a powerset size of maximum 20 classes is used. This represents a good compromise between complexity and performance. Note that it is practically impossible to deal with powerset size of 100 due to the high the computational cost.

Dynamic Frames of Discernment

4 4.1

281

Experiments and Results Datasets and HMM Classifiers

Experiments have been conducted on two publicly available databases: IFN/ ENIT benchmark database of arabic words and RIMES database for latin words. The IFN/ENIT [29] contains a total of 32,492 handwritten words (Arabic script) of 946 Tunisian town/village names written by 411 diﬀerent writers. Four diﬀerent sets (a, b, c, d) are predeﬁned in the database for training and one set (e) for testing. The RIMES database [30] is composed of isolated handwritten word snippets extracted from handwritten letters (Latin script). In our experiments, 36,000 snippets of words are used to train the diﬀerent HMM classiﬁers and 3,000 words are used in the test. The dictionary is composed of 1,612 words. Even if the number of words is rather small with respect to a real size lexicon (up to 100, 000), they are far too numerous, as a frame of discernment of more than 20 classes is not tractable from a computational point of view. Three classiﬁers are deﬁned, each working on diﬀerent feature sets: upper contour, lower contour and density, such as described in [15] (see Fig. 1). The TOP 1 and TOP 2 accuracy rates of each of these classiﬁers is derived from a 10-fold cross-validation on the training sets, which are given in Table 1. This table clearly shows that the two data sets are of heterogeneous diﬃculty. Moreover, the lower contour is always the less informative feature. Practically, in these experiments, we only use the TOP 1 accuracy rate to weight the diﬀerent classiﬁers during their combination, either in the RPM or in DSTbased methods. More precisely, the DST-based method is derived according to the four strategies described above: Intersection (S1), Union (S2), Borda Count (S3) and Probabilistic Pre-Processing, or PPP for short (S4). Practically, we consider for each word, a dynamic frame made of 15 words, in order to have a great enough set, while keeping a reasonable computational complexity.

Fig. 1. DST combination of HMM classiﬁers

282

Y. Kessentini, T. Burger, and T. Paquet Table 1. Individual performances of the HMM classiﬁers IFN/ENIT Top 1 Top 2 HMM 1: Upper contour 73.60 79.77 HMM 2: Lower contour 65.90 74.03 HMM 3: Density 72.97 79.73

4.2

RIMES Top 1 Top 2 54.10 66.40 38.93 51.57 53.23 65.83

Results and Discussion

Table 2 displays the results provided by the four strategies including the RPM. First of all, it appears that the worst results are given by the RPM, and that, the DST-based methods are always more eﬃcient. Second, S1 provides the poorest results among the DST-based methods, and the results are rather similar to that of the RPM, whereas all the other strategies provide similar results which are far better to that of the RPM. To us, the most appealing aspect of DSTbased methods with respect to probabilistic ones, lies in the possibility that the various sources of information remain imprecise. In other words, with them, it is possible that the ﬁrst choice of the classiﬁer is not the good one. Hence, the intersection strategy, by preventing any mistake of a classiﬁer to be balanced by the output of the other classiﬁers, prevents the combination of the sources to behave as with DST principles. Thus, to us, it seems understandable that S1 has a behavior similar to that of the RPM, rather than to that of the other DST-based strategies. Thus, the various methods/strategies can be divided into two groups: Group 1 is made S2, S3, S4 and group 2 is made of S1 and RPM. More precisely, among group 1, it can be seen that (1) on RIMES, S4 is the best one and S3 is slightly better than S2, (2) on IFN/ENIT, S2 is slightly better than S4, which is in turn better than S3. This would lead us to promote the fourth strategy, based on a probabilistic pre-processing. Nonetheless, this assertion should be motivated. Thus, the next point is to check whether the pairwise diﬀerences in the accuracy rates are signiﬁcant or not. If a diﬀerence is signiﬁcant, it means that the ﬁrst method is clearly better than the second one. On the contrary, if the diﬀerence is not statistically signiﬁcant, then, the diﬀerence of performance is too small to decide the superiority of one method over another (as the results Table 2. Accuracy rates of the various strategies on the two datasets IFN/ENIT Top 1 Top 2 S1: Intersection 80.30 83.90 S2: Union 82.00 86.53 S3: Borda Count 81.17 85.73 S4: PPP 81.83 86.53 RPM 80.07 83.23

RIMES Top 1 Top 2 65.50 74.90 68.30 79.80 68.67 80.13 69.47 80.23 64.80 73.10

Dynamic Frames of Discernment

283

Table 3. The p-values of MacNemar’s test for all the pairwise comparisons on the IFN/ENIT dataset S1 S2 S3 S4 RPM S1: Intersection 6.0 × 10−10 6.6 × 10−7 0.0124 4.0 × 10−5 0.5050 S2: Union . 5.6 × 10−3 0.6400 7.2 × 10−8 S3: Borda Count . . 0.0336 2.8 × 10−3 S4: PPP . . . 6.5 × 10−6 RPM . . . . Table 4. The p-values of MacNemar’s test for all the pairwise comparisons on the RIMES dataset S1 S2 S3 S4 RPM S1: Intersection 6.0 × 10−10 6.0 × 10−10 8.8 × 10−2 7.1 × 10−14 0.2175 S2: Union . 0.4363 1.3 × 10−2 7.6 × 10−10 S3: Borda Count . . 9.1 × 10−2 6.6 × 10−13 S4: PPP . . . 4.8 × 10−16 RPM . . . .

would be slightly diﬀerent with other training/testing sets). Test of signiﬁcance is a particular type of statistical hypothesis testing. In our case, the null hypothesis is the equivalence of the methods. Practically, we use MacNemar’s test [31], which is a χ2 test adapted to the comparison of proportions. In Tables 3 and 4, we consider all the pairwise comparisons between two methods, and for each, we compute the p-value, i.e. the probability that the null hypothesis is true. The smaller the p-value, the more the diﬀerence of accuracy is likely to be signiﬁcant. First oﬀ all, the p-values conﬁrm our qualitative interpretations of the accuracy rates: On the two datasets S1 and RPM behave similarly, as the probabilities that the diﬀerences between the proportions is not signiﬁcant are rather high (50.50% on IFN/ENIT, and 21.75% on RIMES). Moreover, let us put the methods in decreasing order of performance (S2, S4, S3, S1, RPM for IFN/ENIT and S4, S3, S2, S1, RPM for RIMES), and let us consider the p-values associated to the comparisons of two consecutive methods according to the previous orders (0.6400, 0.0336, 0.0124, 0.5050 for IFN/ENIT and 9.1 × 10−2 , 0.4663, 6.0 × 10−10 , 0.2175 for RIMES). In this setting, we compare each strategy to the 1 or 2 closest other strategies. It can be noted that whatever the dataset, the smallest p-value (i.e. the most signiﬁcant diﬀerence) corresponds to the comparison of the worst strategy of group 1 (S2, S3 and S4) and the best strategy of group 2 (S1 and RPM), which stresses the relevance of these two groups. Amongst group 1, on the two datasets, the strategies are not sorted in the same order with respect to the accuracy rates, and the p-values are rather high, indicating that these methods are roughly equivalent. Nonetheless, S4 appears to be slightly more eﬃcient, and, from our experiments, the latter strategy (Probabilistic Pre-Processing) should be preferred, even if this choice relies on rather

284

Y. Kessentini, T. Burger, and T. Paquet

weak assumptions, as the similarity of the diﬀerent methods involved requires further experiments for a strong statistical discrimination. Finally, let us point out that the p-value associated to the comparison of S4 and the RPM is so small, that it is almost immaterial. Hence, it proves that, regarding the kind of data involved, the choice of the combination method is no longer questionable, as the DST-based method is deﬁnitely more eﬃcient.

5

Conclusion

In this article, we have considered a problem of classiﬁer combinations in the framework of the Dempster-Shafer theory. More precisely, we have considered problems where the set of classes is too large to be considered as a frame of discernment, such as in handwriting word recognition, where a lexicon may contain up to 100,000 words. Thus, we propose to select for each test word, a reduced number of words (those among which the discrimination is the most diﬃcult according to the test word) in the lexicon to build the frame. This frame is dedicated to a particular test word, which led us to call it dynamic. Then, we propose several procedures to select the words of the lexicon to build the dynamic frame. We compare them on 2 diﬀerent datasets corresponding to Latin and Arabic words, containing respectively 1,612 and 946 word classes each. From our results, the DST provides signiﬁcantly more accurate results than the reference probabilistic method, in spite of the approximation due to the use of a dynamic frame which does not contain all the words. Thus, our method provides more accurate results while keeping the computational complexity under control. Moreover, among the various strategies to build this dynamic frame, the most efﬁcient one corresponds to the selection of the M words which are ranked the best according to the probabilistic reference method. As a conclusion, probabilistic ensemble classiﬁcation seems to be an interesting pre-processing to a DST-based ensemble classiﬁcation on a dynamic frame, when the number of classes in the problem is too great. Further works will include an exhaustive comparison of the various means to take into account information from cross-validation (using various types of discounting, or the method from [17]), and the use of multiple hypothesis testing to make a choice amongst the various strategies described in this article.

References 1. Shafer, G.: A Mathematical Theory of Evidence, Princeton, edn. Princeton University Press, Princeton (1976) 2. Smets, P., Kennes, R.: The transferable belief model. Artiﬁcial Intelligence 66(2), 191–234 (1994) 3. Voorbraak, F.: A computationally eﬃcient approximation of Dempster-Shafer theory. International Journal on Man-Machine Studies 30, 525–536 (1989) 4. Dubois, D., Prade, H., Smets, P.: New semantics for quantitative possibility theory. In: Benferhat, S., Besnard, P. (eds.) ECSQARU 2001. LNCS (LNAI), vol. 2143, pp. 410–421. Springer, Heidelberg (2001)

Dynamic Frames of Discernment

285

5. Grabisch, M.: K-order additive discrete fuzzy measures and their representation. Fuzzy Sets and Systems 92, 167–189 (1997) 6. Masson, M.-H., Denoeux, T.: Belief functions and cluster ensembles. In: Sossai, C., Chemello, G. (eds.) ECSQARU 2009. LNCS, vol. 5590, pp. 323–334. Springer, Heidelberg (2009) 7. Denoeux, T., Yaghlane, A.B.: Approximating the combination of belief functions using the fast moebius transform in a coarsened frame. International Journal of Approximate Reasoning 31(1-2), 77–101 (2002) 8. Janez, F., Appriou, A.: Theory of evidence and non-exhaustive frames of discernment: Plausibilities correction methods. International Journal of Approximate Reasoning 18(1-2), 1–19 (1998) 9. Schubert, J.: Constructing and Reasoning about Alternative Frames of Discernment. In: Proceedings of the Workshop on the Theory of Belief Functions (2010) 10. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989) 11. Xu, L., Krzyzak, A., Suen, C.: Methods of combining multiple classiﬁers and their applications to handwriting recognition. IEEE Trans. Syst., Man, Cybern. (3) (1992) 12. Kim, J.H., Kim, K.K., Nadal, C.P., Suen, C.Y.: A methodology of combining hmm and mlp classiﬁers for cursive word recognition. In: International Conference on Pattern Recognition, vol. 2, pp. 319–322 (2000) 13. Prevost, L., Michel-Sendis, C., Moises, A., Oudot, L., Milgram, M.: Combining model-based and discriminative classiﬁers: application to handwritten character recognition. In: International Conference on Document Analysis and Recognition, vol. 1, pp. 31–35 (2003) 14. Arica, N., Yarman-Vural, F.T.: An overview of character recognition focused on oﬀ-line handwriting. IEEE Trans. Systems, Man and Cybernetics, Part C: Applications and Reviews (2), 216–232 (2001) 15. Kessentini, Y., Paquet, T., Hamadou, A.B.: Oﬀ-line handwritten word recognition using multi-stream hidden markov models. Pattern Recognition Letters 30(1), 60– 70 (2010) 16. Kessentini, Y., Paquet, T., Burger, T.: Comparaison des m´ethodes probabilistes et ´evidentielles de fusion de classiﬁeurs pour la reconnaissance de mots manuscrits. In: CIFED (2010) 17. Kessentini, Y., Burger, T., Paquet, T.: Evidential ensemble hmm classiﬁer for handwriting recognition. In: Proceedings of IPMU, vol. 6178, pp. 445–454 (2010) 18. Al-Ani, A., Deriche, M.: A new technique for combining multiple classiﬁers using the Dempster-Shafer theory of evidence. Journal of Artiﬁcial Intelligence Research 17(1), 333–361 (2002) 19. Altın¸cay, H.: A dempster-shafer theoretic framework for boosting based ensemble design. Pattern Analysis & Applications 8(3), 287–302 (2005) 20. Burger, T., Aran, O., Caplier, A.: Modeling hesitation and conﬂict: A belief-based approach for multi-class problems. In: Fourth International Conference on Machine Learning and Applications, pp. 95–100 (2006) 21. Mercier, D., Cron, G., Denoeux, T., Masson, M.-H.: Fusion de d´ecisions postales dans le cadre du mod´ele des croyances transf´erables. Traitement du Signal 24(2), 133–151 (2007) 22. Valente, F., Hermansky, H.: Combination of acoustic classiﬁers based on dempstershafer theory of evidence. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 4, pp. 1129–1132 (April 2007)

286

Y. Kessentini, T. Burger, and T. Paquet

23. Burger, T., Aran, O., Urankar, A., Akarun, L., Caplier, A.: A dempster-shafer theory based combination of classiﬁers for hand gesture recognition. In: Computer Vision and Computer Graphics - Theory and Applications. CCIS (2008) 24. Bi, Y., Guan, J., Bell, D.A.: The combination of multiple classiﬁers using an evidential reasoning approach. Artif. Intell. 172(15), 1731–1751 (2008) 25. Aran, O., Burger, T., Caplier, A., Akarun, L.: A belief-based sequential fusion approach for fusing manual and non-manual signs. Pattern Recognition 42(5), 812– 822 (2009) 26. Burger, T., Kessentini, Y., Paquet, T.: A tutorial on using dempster-shafer theory to combine probabilistic classiﬁers - application to hand gesture and handwriting recognition. Submitted to Journal of Zhejiang University, Elsevier (2011) 27. Haenni, R.: Are alternatives to Dempster´ıs rule of combination alternatives. Int. J. Information Fusion 3, 237–241 (2002) 28. Quost, B., Masson, M., Denoeux, T.: Classiﬁer fusion in the Dempster-Shafer framework using optimized t-norm based combination rules. International Journal of Approximate Reasoning (2010) 29. Pechwitz, M., Maddouri, S., Maergner, V., Ellouze, N., Amiri, H.: Ifn/enit database of handwritten arabic words. Colloque International Francophone sur l’Ecrit et le Doucement, 129–136 (2002) 30. Grosicki, E., Carre, M., Brodin, J., Geoﬀrois, E.: Results of the rimes evaluation campaign for handwritten mail processing. In: International Conference on Document Analysis and Recognition, pp. 941–945 (2009) 31. McNemar, Q.: Note on the sampling error of the diﬀerence between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)